SlideShare a Scribd company logo
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Showcase: on segmentation importance for
marketing campaign in retail using R and H2O
Wit Jakuczun, WLOG Solutions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Introduction
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Agenda
“At the corner” business case
What is segmentation?
How to build (optimal) segmentation models in H2O?
How to combine segmentation and predictive modelling?
Summary
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Who am I?
Job
owner of company WLOG Solutions
Education
Mathematician (Warsaw University)
My expertise:
Solving business problems with analytical solutions
Implementing and delivering optimization and predictive models
Contact details:
email: w.jakuczun@wlogsolutions.com
WWW: www.wlogsolutions.com
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
If you want to follow me…
I have used:
R version 3.3.2
H2O version 3.10.0.8
Code available at WLOG’s github space
Direct link
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
First step
Download repository using link and extract into any folder
Open retail-segmentation-based-marketing-campaign-in-r-
and-h2o.Rproj in
RStudio
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The result
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Packages installation
source("install_packages.R")
Important: all installation is done locally in libs folder. Your R
environment is not messed up!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
At the corner’s business case
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is At the corner?
At the corner is an analytical driven retail chain selling a
wide variety of products.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is At the corner’s business challenge?
At the corner would like to introduce a new product.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is their business approach?
At the corner decided to go with an e-mail marketing
campaign. To optimize campaign costs and customers’
comfort they decided to carefully select customers that
would be contacted in the campaign.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What has already been done?
Conducted a pilot campaign and gathered customers
responses
Analytical table has been prepared
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is in the analytical table?
data_train <- fread("data/retail_train.csv")
colnames(data_train)
We have following categories of variables:
marketing - did we contact a customer before?
purchased - did the customer purchase in a result of pilot
campaign
demographic: sex, age, income,
behavioural - what is customer’s buying pattern?
basket of products
basket value
purchases in a nearest shop
mean distance to shops
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is our goal?
Score customers in data/retail_test.csv.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Our approach
We will build three types of models:
(M1) Logistic regression with variables from analytical table.
(M2) Logistic regression with variables from analytical table and
a variable from a segmentation model based on behavioural
variables.
(M3) Local logistic regression models with variables from
analytical table for segments calculated by segmentation
model based on behavioural variables.
We will select the best one using AUC measure.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
First model - no segmentation
Most important parts:
source("build_p2b_nosegmentation_model.R")
What is interesting?
?h2o.grid - model meta parameters fitting
find_best_model.R - find best model according to AUC measure
?h2o.auc - internal H2O function for calculating AUC
pROC package
?pROC::roc - ROC curve calculation
?pROC::auc - AUC measure for ROC curve calculation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
First model - results
Our baseline is AUC = 0.6460
And what does Flow say?
http://localhost:54321/flow/index.html
Can we do better with segmentation?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is segmentation?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is your definition?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Naive definition
Unsupervised approach for discovering groups of similar
objects according to some distance/similarity measure.
My (our) definition
Discovering latent variables, that are strongly non-linear
transformations of the input space. The transformation,
being based on metric on input space, are too difficult for
standard supervised algorithms to be discovered.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Why is segmentation difficult?
Business perspective
It is almost impossible to formalize requirements for being good
segmentation in general.
But it is possible (next slides) to formalize requirement for being
good segmentation in predictive modelling.
Technical perspective
Final segments depends on both variables and the distance.
Number of segments is unknown and must be calculated from
data or given by the oracle.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Things to be considered
Popular algorithms (like kmeans) are randomized
Repeat segmentation N times.
Select best segments using e.g. within sum of squares metric
and iterative
Give enough number of iterations to be sure the algorithm has
converged.
Sometimes segment centres cannot be a mean
Can use more expensive medoid approaches
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
How to measure goodness of segmentation methods?
A very informative method is silhouette
Only useful if we have the same distance.
For example choosing number of segments.
But we are in predictive modelling
Use predictive power of the final models!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is good segmentation for predictive model?
Good segmentation is a segmentation that significantly
improves predictive model quality measure (e.g. AUC).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
How to build segmentation models in H2O
and R?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
What is available?
H2O provides k-means algorithm
Tutorial is here
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Let’s analyse the code (1)
Check build_segmentation_models.R
For given range of segments cluster_cnts
Generate rounds segmentations and select the best one
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Let’s analyse the code (2)
Main part - fitting the model:
segmentation_model <- h2o.kmeans(
training_frame = training_frame,
x = segmentation_vars,
k = cluster_cnt,
model_id = sprintf("segmentation_model_%s", cluster_cnt),
init = "PlusPlus",
standardize = TRUE)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Let’s analyse the code (2)
And scoring segmentation model (check file
predict_segmentation_models.R)
h2o.predict(segmentation_model, newdata = train_df)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
How to combine segmentation and
predictive modelling?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Two approaches
Use segments assignment as another predictor.
Build local models for segments.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Segment as another predictor
Check build_p2b_segmentation_model.R
Most important parts:
Lines 45-49: building segmentation models
Lines 53-55: predict segmentation models
Lines 59-77: build predictive models with segments
Lines 61-62: assign segments to customers
Lines 64-74: select best model for given number of segments
Lines 81-93: select best model
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Segment as another predictor - results
Best number of segments is 2.
We obtained AUC = 0.6470
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Local models for segments
Check build_p2b_segmentation_local_models.R
Most important parts:
Lines 48-52: building segmentation models
Lines 55-57: predict segmentation models
Lines 61-83: build local models for segments
Lines 63: assign segments to customers
Lines 67-79: build models for segments for different number of
segments
Lines 87-105: predict local models for test data
Lines 107-132: select best models
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Local models for segments - results
Best number of segments is 2.
We obtained AUC = 0.6512
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary of results
No segmentation was worst with AUC = 0.6460
Segmentation as a predictor was second best with AUC = 0.6470
Local models were best with AUC = 0.6512
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Are the differences significant?
Check compare_models.R
Most important parts
One can compare significance differences for ROC curves
We used DeLong’s test
Conclusions
Adding segmentation as predictor is significant.
Local models give significant improvement to segment as
predictor.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Thank you for you attention!

More Related Content

PDF
The little black book on test design
PDF
Can new web technologies HTML5 & CSS3 kill Flash? Dissertation by Jeremie Cha...
PDF
20150521 ser protecto_r_final
PPTX
Bringing the Power of LocalSolver to R: a Real-Life Case-Study
PDF
Ibm spss categories
PDF
Ibm spss complex_samples
PDF
Ibm spss direct_marketing
PDF
Ibm spss conjoint
The little black book on test design
Can new web technologies HTML5 & CSS3 kill Flash? Dissertation by Jeremie Cha...
20150521 ser protecto_r_final
Bringing the Power of LocalSolver to R: a Real-Life Case-Study
Ibm spss categories
Ibm spss complex_samples
Ibm spss direct_marketing
Ibm spss conjoint

Similar to Showcase: on segmentation importance for marketing campaign in retail using R and H2O (20)

PDF
Systematizing In-Store Traffic and Minimization of Service Quality Gaps of a ...
PDF
Ibm spss forecasting
PDF
bkremer-report-final
PDF
Lecturenotesstatistics
PDF
Computational thinking v0.1_13-oct-2020
PDF
Statistix 10 Manual.pdf
PDF
C++ Templates, 2nd edition
PDF
Ercis wp 18new (1)
PDF
SocioTechnical-systems-sim
PDF
Predictive Modeling and Analytics select_chapters
PDF
Dimensional modeling in a bi environment
PDF
guide-t-cp.pdf programming book will help to
PDF
Rapport d'analyse Dimensionality Reduction
PDF
Vic broquard c++ for computer science and engineering 2006
PDF
C++ For Quantitative Finance
PDF
Programming
PDF
Market microstructure simulator. Overview.
PDF
Master_Thesis_Final
PDF
Thesis_Prakash
Systematizing In-Store Traffic and Minimization of Service Quality Gaps of a ...
Ibm spss forecasting
bkremer-report-final
Lecturenotesstatistics
Computational thinking v0.1_13-oct-2020
Statistix 10 Manual.pdf
C++ Templates, 2nd edition
Ercis wp 18new (1)
SocioTechnical-systems-sim
Predictive Modeling and Analytics select_chapters
Dimensional modeling in a bi environment
guide-t-cp.pdf programming book will help to
Rapport d'analyse Dimensionality Reduction
Vic broquard c++ for computer science and engineering 2006
C++ For Quantitative Finance
Programming
Market microstructure simulator. Overview.
Master_Thesis_Final
Thesis_Prakash
Ad

More from Wit Jakuczun (12)

PDF
recommendation = optimization(prediction)
PDF
Always Be Deploying. How to make R great for machine learning in (not only) E...
PDF
Driving your marketing automation with multi-armed bandits in real time
PDF
Know your R usage workflow to handle reproducibility challenges
PDF
Large scale machine learning projects with r suite
PDF
Managing large (and small) R based solutions with R Suite
PDF
20170928 why r_r jako główna platforma do zaawansowanej analityki w enterprise
PDF
Wit jakuczun dss_conf_2017_jak_wdrazac_r_w_enterprise
PDF
Case Studies in advanced analytics with R
PDF
ANALYTICS WITHOUT LOSS OF GENERALITY
PDF
Rozwiązywanie problemów optymalizacyjnych (z przykładem w R)
PDF
R+H2O - idealny tandem do analityki predykcyjnej?
recommendation = optimization(prediction)
Always Be Deploying. How to make R great for machine learning in (not only) E...
Driving your marketing automation with multi-armed bandits in real time
Know your R usage workflow to handle reproducibility challenges
Large scale machine learning projects with r suite
Managing large (and small) R based solutions with R Suite
20170928 why r_r jako główna platforma do zaawansowanej analityki w enterprise
Wit jakuczun dss_conf_2017_jak_wdrazac_r_w_enterprise
Case Studies in advanced analytics with R
ANALYTICS WITHOUT LOSS OF GENERALITY
Rozwiązywanie problemów optymalizacyjnych (z przykładem w R)
R+H2O - idealny tandem do analityki predykcyjnej?
Ad

Recently uploaded (20)

PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Foundation of Data Science unit number two notes
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Quality review (1)_presentation of this 21
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Foundation of Data Science unit number two notes
Fluorescence-microscope_Botany_detailed content
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
Business Acumen Training GuidePresentation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Qualitative Qantitative and Mixed Methods.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu

Showcase: on segmentation importance for marketing campaign in retail using R and H2O