SlideShare a Scribd company logo
Tayko
Smart Marketing using analytics
Business Problem
 Tayko is a software catalog firm that sells games and educational software
 Want to market a new collection using e-mail marketing.
 As member of an industry consortium, they can pull 2,00,000 emails address from
the central repository of the consortium.
 To maximize the benefit, Tayko wants to pull records with high probability of
response and higher value of sale.
Analytics Problem
1. Create a classification model to groups the customer as responder or
purchasers(1) and non-responders or non-purchasers(0).
2. Create a prediction model to predict the value of sale of the responder(1).
Data Collection
 Supervised learning techniques is to be applied as a desired output is required is
already defined.
 A sample of 2000 customer is drawn form the central repository and test e-mail
marketing is done.
 The 2 target variables : Purchased and Spending is recorded for the sample.
 The result showed 1000 purchasers and 1000 non-purchasers
Data partitioning
 The data set is partitioned into
 Training set – 60% - 1200 records
 Testing – 20% - 400 records
 Validation – 20% - 400 records
Initial Study
What kind of variables are present.
Finding the variables with strong differentiation
power – Nominal Variables
 Use of Catalog A, T, U, P show high percentage of people making a purchase
 Use of Catalog O, H show high percentage of people not making a purchase
 But only Catalog A & U has been used for more than 100 customers.
 Catalog H for more than 50 customers & rest below 50 customers.
 Distribution of catalogs were not even.
Other Nominal Variables
 Out of other categorical variables : “Order Online” is the only one which show some
power to differentiate between customer who purchased and the non-purchasers.
Ordinal Variables
 Number of purchase last year shows a good trend
 People who have not made any purchase last year have
not made any purchase with the new catalogs also.
 People who had made more than 3 purchase has surly
made a purchase this time also
Scale Variables
 Out of the 2 scale variables “Last update to customer record” shows a significant
difference in their mean.
Target Variables
 Purchaser and non-purchasers are equally distributed
 However the sales value or the amount spend by customer follows a non-normal
distribution
Classification
Who will make a purchase?
Logistic Regression – Training
Final set of variables
1. Frequency : Number of transactions in last year at source
catalog
2. Web Order : Customer placed at least 1 order via web
3. Address is Residence : Address is a residence
4. Source_a, h or u :Source Catalog is A, U or H
Data mining to improve e-mail marketing
Logistic Regression – Testing & Validation
 Test
 Over-all accuracy : 80%
 Validation
 Over-all accuracy : 77%
Decision Tree – Training
 CHAID Growing method gave best results
Data mining to improve e-mail marketing
Decision Tree – Test & Validate
 Test
 Over-all accuracy : 76%
 Validation
 Over-all accuracy : 74%
Result
 Logistic regression gives a better result than decision tree
Prediction
How much a purchaser will spend?
New Calculated Variables
• High correlation between “last_update_days_ago ” and
“1st_update_days_ago ”
• New calculated variable DayDiff which is difference of the 2
variables
Multiple Linear Regression
 Pre-processiong
 Univariate analysis and transformation of Target Variable “Spend”
Outlier removal,
Filtering and
Transformation
Model & Performance
 4 models are generated
 Case 1 : None Residence Address & Not a Web-Order (R-sqr : 0.569 & Adj R-sqr : 0.566)
Spending = -15.733 + 79.11 * No of transaction last year – 47.825 * Catalog D + 30.632 * Catalog U
 Case 2 : None Residence Address & Web-Order (R-sqr : 0.62 & Adj R-sqr : 0.616)
Spending = -42.285 + 115.976 * No of transaction last year + 45.506 * Catalog U -247.655 * Catalog H +
55.605 Catalog R
 Case 3 : Residence Address & Not a Web-Order (R-sqr : 0.516 & Adj R-sqr : 0.507)
Spending = -26.965 + 69.218 * No of transaction last year + 66.219 * Catalog U – 113.587*Catalog H
 Case 4 : Residence Address & Web-Order (R-sqr : 0.612 & Adj R-sqr : 0.592)
Spending = -4.616 + 65.114 * No of transaction last year - 111.934*Catalog H – 81.28 * Catalog R – 129.754
* Catalog C + 66.242 * Catalog A
MAD & MAPE
 Training
 MAD : 68.89
 MAPE : 103%
 Test
 MAD : 104.53
 MAPE : 109%
 Validation
 MAD : 104.03
 MAPE : 101%
Regression Tree
 Exhaustive CHAID
MAD & MAPE
 Training
 MAD : 105.37
 MAPE : 95%
 Test
 MAD : 121.54
 MAPE : 103%
 Validation
 MAD : 121.31
 MAPE : 113%
Decision
 Both the models are very weak in predicting the amount spent
 There is high error for evaluation indicators.
 One major reason for this can be the lack of scale variables and high correlation
between whatever scale variables are given.
 Since most variables are of nominal type, converting the prediction problem to
classification might produce better result. But it was out of scope for the given
problem.
Conclusion
 The classification of customer into purchasers and non-purchasers shows good
result and the elected logistic regression model is expected to show high
performance in live situation also.
 However the prediction models show weak performance and a high degree of error
is expected if used in the current state.

More Related Content

PPTX
Correlation.pptx
PDF
Applications to Central Limit Theorem and Law of Large Numbers
PPTX
Chap02 describing data; numerical
PDF
Linear regression theory
PPT
Multiple regression presentation
PPT
Prediction of house price using multiple regression
PPTX
Data mining approaches and methods
PPTX
Regression
Correlation.pptx
Applications to Central Limit Theorem and Law of Large Numbers
Chap02 describing data; numerical
Linear regression theory
Multiple regression presentation
Prediction of house price using multiple regression
Data mining approaches and methods
Regression

What's hot (20)

PPTX
Regression Analysis
PDF
Least Squares Regression Method | Edureka
PDF
Practice test ch 10 correlation reg ch 11 gof ch12 anova
PPTX
Karl pearson's coefficient of correlation (1)
PPTX
Measures of central tendency mean
PPTX
Stat 3203 -multphase sampling
PPTX
Measures of correlation (pearson's r correlation coefficient and spearman rho)
PPT
Simple Regression
PPTX
Reporting Pearson Correlation Test of Independence in APA
PPT
Chapter 09
PDF
Simple linear regression
PPTX
Categorical data analysis.pptx
PPTX
Calculate Z Score, t Score and Percentile in SPSS
PPT
Chapter 4 - multiple regression
PPT
Chapter 15
ODP
Multiple linear regression
DOCX
Anomaly detection Full Article
PPTX
The chi square test of indep of categorical variables
PPTX
CHAPTER 11 LOGISTIC REGRESSION.pptx
PPTX
Multinomial Logistic Regression Analysis
Regression Analysis
Least Squares Regression Method | Edureka
Practice test ch 10 correlation reg ch 11 gof ch12 anova
Karl pearson's coefficient of correlation (1)
Measures of central tendency mean
Stat 3203 -multphase sampling
Measures of correlation (pearson's r correlation coefficient and spearman rho)
Simple Regression
Reporting Pearson Correlation Test of Independence in APA
Chapter 09
Simple linear regression
Categorical data analysis.pptx
Calculate Z Score, t Score and Percentile in SPSS
Chapter 4 - multiple regression
Chapter 15
Multiple linear regression
Anomaly detection Full Article
The chi square test of indep of categorical variables
CHAPTER 11 LOGISTIC REGRESSION.pptx
Multinomial Logistic Regression Analysis
Ad

Similar to Data mining to improve e-mail marketing (20)

PPTX
How your favorite retailers make money out of analytics
PDF
Customer analytics for e commerce
PPTX
Is Your Marketing Database "Model Ready"?
PPTX
Is Your Marketing Database "Model Ready"?
PPTX
Marketing Campaign Efficacy
PPTX
Black Friday Shopping Prediction
PPTX
DMA Analytics Challenge 2015 (Winner - First Position)
PDF
Lead Scoring Case Study, Data Analysis case study
PDF
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...
PPTX
Enhancing E-Commerce Efficiency: Predicting Delivery Times with Machine Learning
PPTX
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
PPTX
Understanding the Lifecycle of a Data Analysis Project
PPTX
Lead Scoring Case Study_Final.pptx
PDF
DMA Analytic Challenge 2015 final
PPTX
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
PDF
Big Data Analytics for Predicting Consumer Behaviour
PDF
De-Mystefying Predictive Analytics
PPTX
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
PPTX
Modeling for the Non-Statistician
PPTX
Wooing the Best Bank Deposit Customers
How your favorite retailers make money out of analytics
Customer analytics for e commerce
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
Marketing Campaign Efficacy
Black Friday Shopping Prediction
DMA Analytics Challenge 2015 (Winner - First Position)
Lead Scoring Case Study, Data Analysis case study
A Machine Learning Approach to Predict the Consumer Purchasing Behavior on E-...
Enhancing E-Commerce Efficiency: Predicting Delivery Times with Machine Learning
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Understanding the Lifecycle of a Data Analysis Project
Lead Scoring Case Study_Final.pptx
DMA Analytic Challenge 2015 final
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
Big Data Analytics for Predicting Consumer Behaviour
De-Mystefying Predictive Analytics
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Modeling for the Non-Statistician
Wooing the Best Bank Deposit Customers
Ad

More from Ritu Sarkar (9)

PPTX
Google analytics
PPTX
Credit risk scoring model final
PPTX
Candy score score
PPTX
Simulation model sortation system
PDF
La liga 2013 2014 analysis
PPTX
Driver profile caused accident
PPTX
Kaggel cab serivce
PPTX
Big Data solution for multi-national Bank
PPTX
Best analytics tool
Google analytics
Credit risk scoring model final
Candy score score
Simulation model sortation system
La liga 2013 2014 analysis
Driver profile caused accident
Kaggel cab serivce
Big Data solution for multi-national Bank
Best analytics tool

Recently uploaded (20)

PPTX
modul_python (1).pptx for professional and student
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
Introduction to Inferential Statistics.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Microsoft Core Cloud Services powerpoint
PDF
Introduction to the R Programming Language
PPTX
New ISO 27001_2022 standard and the changes
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
[EN] Industrial Machine Downtime Prediction
PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
Transcultural that can help you someday.
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
modul_python (1).pptx for professional and student
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
STERILIZATION AND DISINFECTION-1.ppthhhbx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Introduction to Inferential Statistics.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Optimise Shopper Experiences with a Strong Data Estate.pdf
Microsoft Core Cloud Services powerpoint
Introduction to the R Programming Language
New ISO 27001_2022 standard and the changes
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
[EN] Industrial Machine Downtime Prediction
DU, AIS, Big Data and Data Analytics.ppt
Transcultural that can help you someday.
CYBER SECURITY the Next Warefare Tactics
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx

Data mining to improve e-mail marketing

  • 2. Business Problem  Tayko is a software catalog firm that sells games and educational software  Want to market a new collection using e-mail marketing.  As member of an industry consortium, they can pull 2,00,000 emails address from the central repository of the consortium.  To maximize the benefit, Tayko wants to pull records with high probability of response and higher value of sale.
  • 3. Analytics Problem 1. Create a classification model to groups the customer as responder or purchasers(1) and non-responders or non-purchasers(0). 2. Create a prediction model to predict the value of sale of the responder(1).
  • 4. Data Collection  Supervised learning techniques is to be applied as a desired output is required is already defined.  A sample of 2000 customer is drawn form the central repository and test e-mail marketing is done.  The 2 target variables : Purchased and Spending is recorded for the sample.  The result showed 1000 purchasers and 1000 non-purchasers
  • 5. Data partitioning  The data set is partitioned into  Training set – 60% - 1200 records  Testing – 20% - 400 records  Validation – 20% - 400 records
  • 6. Initial Study What kind of variables are present.
  • 7. Finding the variables with strong differentiation power – Nominal Variables  Use of Catalog A, T, U, P show high percentage of people making a purchase  Use of Catalog O, H show high percentage of people not making a purchase  But only Catalog A & U has been used for more than 100 customers.  Catalog H for more than 50 customers & rest below 50 customers.  Distribution of catalogs were not even.
  • 8. Other Nominal Variables  Out of other categorical variables : “Order Online” is the only one which show some power to differentiate between customer who purchased and the non-purchasers.
  • 9. Ordinal Variables  Number of purchase last year shows a good trend  People who have not made any purchase last year have not made any purchase with the new catalogs also.  People who had made more than 3 purchase has surly made a purchase this time also
  • 10. Scale Variables  Out of the 2 scale variables “Last update to customer record” shows a significant difference in their mean.
  • 11. Target Variables  Purchaser and non-purchasers are equally distributed  However the sales value or the amount spend by customer follows a non-normal distribution
  • 13. Logistic Regression – Training Final set of variables 1. Frequency : Number of transactions in last year at source catalog 2. Web Order : Customer placed at least 1 order via web 3. Address is Residence : Address is a residence 4. Source_a, h or u :Source Catalog is A, U or H
  • 15. Logistic Regression – Testing & Validation  Test  Over-all accuracy : 80%  Validation  Over-all accuracy : 77%
  • 16. Decision Tree – Training  CHAID Growing method gave best results
  • 18. Decision Tree – Test & Validate  Test  Over-all accuracy : 76%  Validation  Over-all accuracy : 74%
  • 19. Result  Logistic regression gives a better result than decision tree
  • 20. Prediction How much a purchaser will spend?
  • 21. New Calculated Variables • High correlation between “last_update_days_ago ” and “1st_update_days_ago ” • New calculated variable DayDiff which is difference of the 2 variables
  • 22. Multiple Linear Regression  Pre-processiong  Univariate analysis and transformation of Target Variable “Spend” Outlier removal, Filtering and Transformation
  • 23. Model & Performance  4 models are generated  Case 1 : None Residence Address & Not a Web-Order (R-sqr : 0.569 & Adj R-sqr : 0.566) Spending = -15.733 + 79.11 * No of transaction last year – 47.825 * Catalog D + 30.632 * Catalog U  Case 2 : None Residence Address & Web-Order (R-sqr : 0.62 & Adj R-sqr : 0.616) Spending = -42.285 + 115.976 * No of transaction last year + 45.506 * Catalog U -247.655 * Catalog H + 55.605 Catalog R  Case 3 : Residence Address & Not a Web-Order (R-sqr : 0.516 & Adj R-sqr : 0.507) Spending = -26.965 + 69.218 * No of transaction last year + 66.219 * Catalog U – 113.587*Catalog H  Case 4 : Residence Address & Web-Order (R-sqr : 0.612 & Adj R-sqr : 0.592) Spending = -4.616 + 65.114 * No of transaction last year - 111.934*Catalog H – 81.28 * Catalog R – 129.754 * Catalog C + 66.242 * Catalog A
  • 24. MAD & MAPE  Training  MAD : 68.89  MAPE : 103%  Test  MAD : 104.53  MAPE : 109%  Validation  MAD : 104.03  MAPE : 101%
  • 26. MAD & MAPE  Training  MAD : 105.37  MAPE : 95%  Test  MAD : 121.54  MAPE : 103%  Validation  MAD : 121.31  MAPE : 113%
  • 27. Decision  Both the models are very weak in predicting the amount spent  There is high error for evaluation indicators.  One major reason for this can be the lack of scale variables and high correlation between whatever scale variables are given.  Since most variables are of nominal type, converting the prediction problem to classification might produce better result. But it was out of scope for the given problem.
  • 28. Conclusion  The classification of customer into purchasers and non-purchasers shows good result and the elected logistic regression model is expected to show high performance in live situation also.  However the prediction models show weak performance and a high degree of error is expected if used in the current state.