SlideShare a Scribd company logo
K-MEANS CLUSTERING
WITH ORANGE
IDENTIFY CUSTOMER SEGMENTS
OF A SOCIAL ENTERPRISE TO
CREATE CUSTOMER OFFERS FOR EACH SEGMENT
AUTHOR: ANTHONY MOK
DATE: 18 NOV 2023
EMAIL: XXIAOHAO@YAHOO.COM
WHAT IS ORANGE
Open-source
and Extensible
Freely available,
adaptable, and
customisable
data mining tool
Visual
Programming
Drag-and-drop
interface for
building data
analysis
workflows
Interactive Data
Exploration
Quickly
understand data
patterns and
trends using
visualisations
Wide Range of
Data Mining
Algorithms
Identify patterns,
make predictions,
and solve data mining
problems
PROJECT’S CONTEXT, OBJECTIVE & STRATEGIES
To identify customer
segments to customised
offers for each segment
Social Enterprise
collected data on
customers & wants to make
insight-informed decisions
• Explore & Clean data for
analysis
• Perform K-Means Clustering,
in Orange, to find possible
segments in the customer
data
• Tune the model to improve its
performance
• Visualise the findings, share
conclusions, and give insight-
driven recommendations
EXPLORATORY DATA ANALYSIS
Findings
• Target = Recency_in_Day
• Provides insights into customer behavior,
preferences, and churn risk
• Feature Columns = 9
• Instances = 2,240
• Blanks & Outliers
Age Column Income Column
23 Blanks -
1 Outlier 3 Outliers
K-MEANS CLUSTERING WORKFLOW IN ORANGE
LOADING DATA & DEALING WITH BLANKS
Customer.csv file imported into
workflow with the ‘Role’ of
Recency_days set as ‘Target’,‘ID’ as
“meta’, with the rest as ‘features’
Exploratory Data Analysis (EDA) was
considered, and blanks are imputed
by ‘Average’ of sum of values in the
‘Income’ column
EXAMINING RELATIONSHIPS & PATTERNS
Scatter Plots were created
to explore the relationships
and patterns in the dataset
‘Recency_days’is the ‘Target’
with Four feature columns
selected for the model:
‘Income’ & ‘Age’ (Numerical
Data) & ‘Marital Status’ &
‘Education’, since these are
more informative
IDENTIFYING IDEAL NUMBER OF CLUSTERS
• To determine the ideal number of
clusters, the Silhouette Scores in the
range of 2 to 12 clusters were
calculated
• Overall, the Silhouette Scores are
positive, but relatively low, suggesting
the clustering is fair, but there is still
some overlaps between clusters
• Clustering parameters can be
adjusted to improve the separation
between clusters
BOOSTING MODEL’S PERFORMANCE & LIMITATIONS
• By default,‘K-Means++’ & ‘Normalise Columns’
are enabled in the Hyperparameters
• So only ‘Maximum Iterations’ was set to 100,000
(from 300) and ‘Re-runs’ at 100 (from 10) to
boost the performance of the model
• But the Silhouette Scores haven’t improved in
the range of 2 to 12 clusters after these changes,
suggesting that the K-Means Clustering
Algorithm has converged to a stable solution
BOOSTING MODEL’S PERFORMANCE & LIMITATIONS
In this stable state, scores can be
increased at the upper ranges of
the clusters, but will result to
overfitting the model to the dataset
To avoid this outcome, the
conservative number of 3 Clusters
was chosen (Silhouette Score =
0.217) instead
FINDINGS & CONCLUSIONS
• Maximum income of customer base is
$100,000/annum
• For customers in the age range of 30 to 55, half of
these earned below $50,000/annum, who could
be price sensitive and are bargain hunters, while
the other half earned above this threshold, who
may be able to pay a premium for quality
• Higher concentration of customers is found to
have undergraduate degrees, who are more
educated, and they are separated equally into
two clusters: singles, with more ability for
discretionary spending, and married couples,
with less spending power given children/teens in
their households
• Customers above 55 are even distributed across
all income groups
* More comprehensive findings and conclusions were provided in the project report, which
are not released at the request of the Social Enterprise
RECOMMENDATIONS*
Segment 1 - Customers in the age range of 30 to 55
who earned below $50,000/annum
• Offer value-for-money products and services
• Highlight discounts and promotions
• Offer bundle deals and loyalty programs
• Target them with personalised marketing campaigns
based on their purchase history and interests
* More recommendations were provided for each identified cluster in the project report,
which are not released at the request of the Social Enterprise
Segment 3 - Customers with undergraduate degrees
• Offer educational and informative content
• Highlight the benefits of products and services for their
careers and personal development
• Partner with other businesses that offer complementary
products and services
• Target them with personalised marketing campaigns
based on their interests and areas of expertise
K-MEANS CLUSTERING
WITH ORANGE
IDENTIFY CUSTOMER SEGMENTS
OF A SOCIAL ENTERPRISE TO
CREATE CUSTOMER OFFERS FOR EACH SEGMENT
AUTHOR: ANTHONY MOK
DATE: 18 NOV 2023
EMAIL: XXIAOHAO@YAHOO.COM

More Related Content

PPTX
Customer value segmentation- Segmaentation of silverjet
PPTX
Insurance Churn Prediction Data Analysis Project
PPTX
1000 track2 boire
PDF
Rapid Optimization Application Development Using Excel and Solver
PDF
Data Science Use cases in Banking
PDF
Customer insight presentation s houston - boston march 2014
PPTX
Customer analytics
Customer value segmentation- Segmaentation of silverjet
Insurance Churn Prediction Data Analysis Project
1000 track2 boire
Rapid Optimization Application Development Using Excel and Solver
Data Science Use cases in Banking
Customer insight presentation s houston - boston march 2014
Customer analytics

Similar to Identify Customer Segments to Create Customer Offers for Each Segment - Application of K-Means Clustering With Orange (20)

PDF
Predictive Analytics Demystified
PPTX
Unit I-Final MArketing analytics unit 1 ppt
PPTX
image classifier.pptx
PPTX
MonetizingStatistics
PDF
Liferay overview of predicitve analytics
PPTX
Employee Churn Prediction: Artificial Intelligence Project Presentation
PPTX
Maximizing Retention with Minimal Effort
PPTX
Digital Marketing Campaign Conversion Prediction
PPTX
Digital Marketing Campaign Conversion Prediction.
PPTX
Bank Marketing Analysis: Data Analysis Project
PDF
Customer Segmentation
PPTX
Customer Analytics Overview
PPTX
Recency/Frequency and Predictive Analytics in the gaming industry
PPTX
Hair_EOMA_1e_Chap001_PPT.pptx
PPTX
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
PDF
Business analytics & strategy
PPT
Stark Consulting Services Inc.- BDAS Capabilities Presentation
PPTX
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
PDF
E-commerce Berlin Expo 2018 - How to boost your online sales using machine le...
Predictive Analytics Demystified
Unit I-Final MArketing analytics unit 1 ppt
image classifier.pptx
MonetizingStatistics
Liferay overview of predicitve analytics
Employee Churn Prediction: Artificial Intelligence Project Presentation
Maximizing Retention with Minimal Effort
Digital Marketing Campaign Conversion Prediction
Digital Marketing Campaign Conversion Prediction.
Bank Marketing Analysis: Data Analysis Project
Customer Segmentation
Customer Analytics Overview
Recency/Frequency and Predictive Analytics in the gaming industry
Hair_EOMA_1e_Chap001_PPT.pptx
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
Business analytics & strategy
Stark Consulting Services Inc.- BDAS Capabilities Presentation
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
E-commerce Berlin Expo 2018 - How to boost your online sales using machine le...
Ad

More from ThinkInnovation (20)

PPTX
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
PDF
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
PDF
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
PDF
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
PDF
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
PDF
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
PDF
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
PDF
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
PDF
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
PDF
Optimal Decision Making - Cost Reduction in Logistics
PDF
Create Data Model & Conduct Visualisation in Power BI Desktop
PDF
Using DAX & Time-based Analysis in Data Warehouse
PDF
Creating Data Warehouse Using Power Query & Power Pivot
PPTX
Unlocking New Insights Into the World of European Soccer Through the European...
PPT
Breakfast Talk - Manage Projects
PPT
Think innovation issue 4 share - scamper
PPT
PPT
Reverse Assumption Method
PPT
Psyche of Facilitation - The New Language of Facilitating Conversations
PPT
Visual Connection - Ideation Through Word Association
Difference in Differences - Does Strict Speed Limit Restrictions Reduce Road ...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Optimal Decision Making - Cost Reduction in Logistics
Create Data Model & Conduct Visualisation in Power BI Desktop
Using DAX & Time-based Analysis in Data Warehouse
Creating Data Warehouse Using Power Query & Power Pivot
Unlocking New Insights Into the World of European Soccer Through the European...
Breakfast Talk - Manage Projects
Think innovation issue 4 share - scamper
Reverse Assumption Method
Psyche of Facilitation - The New Language of Facilitating Conversations
Visual Connection - Ideation Through Word Association
Ad

Recently uploaded (20)

PPT
Predictive modeling basics in data cleaning process
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Lecture1 pattern recognition............
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
modul_python (1).pptx for professional and student
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Business Analytics and business intelligence.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
Predictive modeling basics in data cleaning process
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Lecture1 pattern recognition............
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
ISS -ESG Data flows What is ESG and HowHow
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
modul_python (1).pptx for professional and student
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Analytics and business intelligence.pdf
climate analysis of Dhaka ,Banglades.pptx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
SAP 2 completion done . PRESENTATION.pptx

Identify Customer Segments to Create Customer Offers for Each Segment - Application of K-Means Clustering With Orange

  • 1. K-MEANS CLUSTERING WITH ORANGE IDENTIFY CUSTOMER SEGMENTS OF A SOCIAL ENTERPRISE TO CREATE CUSTOMER OFFERS FOR EACH SEGMENT AUTHOR: ANTHONY MOK DATE: 18 NOV 2023 EMAIL: XXIAOHAO@YAHOO.COM
  • 2. WHAT IS ORANGE Open-source and Extensible Freely available, adaptable, and customisable data mining tool Visual Programming Drag-and-drop interface for building data analysis workflows Interactive Data Exploration Quickly understand data patterns and trends using visualisations Wide Range of Data Mining Algorithms Identify patterns, make predictions, and solve data mining problems
  • 3. PROJECT’S CONTEXT, OBJECTIVE & STRATEGIES To identify customer segments to customised offers for each segment Social Enterprise collected data on customers & wants to make insight-informed decisions • Explore & Clean data for analysis • Perform K-Means Clustering, in Orange, to find possible segments in the customer data • Tune the model to improve its performance • Visualise the findings, share conclusions, and give insight- driven recommendations
  • 4. EXPLORATORY DATA ANALYSIS Findings • Target = Recency_in_Day • Provides insights into customer behavior, preferences, and churn risk • Feature Columns = 9 • Instances = 2,240 • Blanks & Outliers Age Column Income Column 23 Blanks - 1 Outlier 3 Outliers
  • 6. LOADING DATA & DEALING WITH BLANKS Customer.csv file imported into workflow with the ‘Role’ of Recency_days set as ‘Target’,‘ID’ as “meta’, with the rest as ‘features’ Exploratory Data Analysis (EDA) was considered, and blanks are imputed by ‘Average’ of sum of values in the ‘Income’ column
  • 7. EXAMINING RELATIONSHIPS & PATTERNS Scatter Plots were created to explore the relationships and patterns in the dataset ‘Recency_days’is the ‘Target’ with Four feature columns selected for the model: ‘Income’ & ‘Age’ (Numerical Data) & ‘Marital Status’ & ‘Education’, since these are more informative
  • 8. IDENTIFYING IDEAL NUMBER OF CLUSTERS • To determine the ideal number of clusters, the Silhouette Scores in the range of 2 to 12 clusters were calculated • Overall, the Silhouette Scores are positive, but relatively low, suggesting the clustering is fair, but there is still some overlaps between clusters • Clustering parameters can be adjusted to improve the separation between clusters
  • 9. BOOSTING MODEL’S PERFORMANCE & LIMITATIONS • By default,‘K-Means++’ & ‘Normalise Columns’ are enabled in the Hyperparameters • So only ‘Maximum Iterations’ was set to 100,000 (from 300) and ‘Re-runs’ at 100 (from 10) to boost the performance of the model • But the Silhouette Scores haven’t improved in the range of 2 to 12 clusters after these changes, suggesting that the K-Means Clustering Algorithm has converged to a stable solution
  • 10. BOOSTING MODEL’S PERFORMANCE & LIMITATIONS In this stable state, scores can be increased at the upper ranges of the clusters, but will result to overfitting the model to the dataset To avoid this outcome, the conservative number of 3 Clusters was chosen (Silhouette Score = 0.217) instead
  • 11. FINDINGS & CONCLUSIONS • Maximum income of customer base is $100,000/annum • For customers in the age range of 30 to 55, half of these earned below $50,000/annum, who could be price sensitive and are bargain hunters, while the other half earned above this threshold, who may be able to pay a premium for quality • Higher concentration of customers is found to have undergraduate degrees, who are more educated, and they are separated equally into two clusters: singles, with more ability for discretionary spending, and married couples, with less spending power given children/teens in their households • Customers above 55 are even distributed across all income groups * More comprehensive findings and conclusions were provided in the project report, which are not released at the request of the Social Enterprise
  • 12. RECOMMENDATIONS* Segment 1 - Customers in the age range of 30 to 55 who earned below $50,000/annum • Offer value-for-money products and services • Highlight discounts and promotions • Offer bundle deals and loyalty programs • Target them with personalised marketing campaigns based on their purchase history and interests * More recommendations were provided for each identified cluster in the project report, which are not released at the request of the Social Enterprise Segment 3 - Customers with undergraduate degrees • Offer educational and informative content • Highlight the benefits of products and services for their careers and personal development • Partner with other businesses that offer complementary products and services • Target them with personalised marketing campaigns based on their interests and areas of expertise
  • 13. K-MEANS CLUSTERING WITH ORANGE IDENTIFY CUSTOMER SEGMENTS OF A SOCIAL ENTERPRISE TO CREATE CUSTOMER OFFERS FOR EACH SEGMENT AUTHOR: ANTHONY MOK DATE: 18 NOV 2023 EMAIL: XXIAOHAO@YAHOO.COM