SlideShare a Scribd company logo
PredictingCustomer’s
NextOrders
Erdi Güngör
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
ClassificationBusiness
objective
Predictive models
2
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
3
Business
objective
Objective
 The online shopping mall problem
To make easy to fill basket with personal favorites of the customers
What will be in the next market basket ?
Objective
 Supervised Learning
 Unsupervised Learning (Apriori, Collaborative Filtering)
 Deep LearningApproach
 MarkovChain
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
6
Business
objective
Prepare Data /
Source Data
 Open source data
Aisle – 135 record Department – 21
record
Order Products
Prior - 1048576
Order Products
Train - 1048576
Orders- 3421083 Products - 49689
Prepare Data /
Base Data
 Data Preparation
 Train set: 13863746 -23 Test set: 4833292 -24
Order_eval
_set
User Order Order_2 Order_2_
eval_set
Products
_2
Reordered
Test 1234 4 1 Prior A 0
Test 1234 4 1 Prior B 0
Test 1234 4 1 Prior C 0
Test 1234 4 2 Prior D 0
Test 1234 4 2 Prior B 1
Test 1234 4 2 Prior E 0
Test 1234 4 3 Prior F 0
Test 1234 4 3 Prior G 1
Test 1234 4 3 Prior E 1
Order_ev
al_set
User Order Products Reordere
d
Prior 5678 10 X 0
Prior 5678 10 Y 0
Prior 5678 10 Z 0
Prior 5678 11 W 0
Prior 5678 11 Q 0
Prior 5678 11 Z 1
Train 5678 12 W 1
Train 5678 12 Q 1
Train 5678 12 Z 1
Prepare Data /
Features
Customer
Based
• Total number of orders /
products / unique
products / unique
department / unique
aisle / unique order day
• Sum / Avg. of reordered
products
• The customer period
• Avg.Time between
orders
• Rate of reordered
products in total
products
• The order count after
related product
Customer –
Product Based
• Avg. sequence in market
basket
• Sum of reordered
information of product
• The max./avg. hour to
order product
• The min. / max. order
number of product
• The most popular hour
for product
Clustering
Label
• Kmeans
5 cluster
Wrt percentage usage of
sub-segment in over all
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
10
Business
objective
Algorithms
 DecisionTrees, Naive Bayes, Instance Based, Logistic Regression,
SupportVector Machine, Regression
 Ensemble Learning: Random Forest, AdaBoost, Gradient Boosting
 Neural Networks
Algorithms
 Train-Test Split ( K-crossValidation)
 Extreme Gradient Boosting (XGBoost)*
 Random Forest
 Logistic Regression
 DecisionTree
 Gradient Boosting
Algorithms
Extreme Gradient Boosting (XGBoost)
 Implementation of gradient boosted decision trees
 Regularization
 Parallel processing
 High flexibility ( all types of data)
 Handling missing values and tree pruning
Algorithms
Extreme Gradient Boosting (XGBoost) Parameters
 Objective (Reg:Logistic): Learning objective
 Eval_metric (Logloss): Evaluation metric/Negative log-likelihood
 Eta (0.1): Shrinkage the feature weights to prevent overfitting
 Max_depth (7): Maximum depth of tree
 Min_child_weight (9) :Minimum sum of instance weight
 Gamma (0.80): Minimum loss reduction to make further partition
 Subsample (0.76): Subsample ratio of the training instance
 Colsample_bytree (0.95): Subsample ratio of columns when
constructing each tree
 Alpha (0) : L1 regularization term on weight
 Lambda (1) : L2 regularization term on weight
Algorithms
Extreme Gradient Boosting (XGBoost)
 Dmatrix : data structure
 Training model
 Label Prediction-> probabilistic values
 Threshold (0.20)
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
16
Business
objective
Interpret
Results
• 5-Fold CrossValidation Model performance: + 90% accuracy
THANK
YOU
 Andrew Ng – Geoffrey Hinton
 https://www.kdnuggets.com/
 https://www.kaggle.com/

More Related Content

PDF
Chuck Half - MSA Gauge and Scada
PDF
Santander
PPT
E06WarehouseDesign.pptxkjhjkljhlkjhlkhlkj
PPT
E06WarehouseDesignissuesindatawarehousedesign.ppt
PPTX
Presentation Title
PDF
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
PDF
Fast, Powerful and Scalable Analytics
PPT
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
Chuck Half - MSA Gauge and Scada
Santander
E06WarehouseDesign.pptxkjhjkljhlkjhlkhlkj
E06WarehouseDesignissuesindatawarehousedesign.ppt
Presentation Title
Your Raw Data is Ready - Introduction to Analytics Engineering | SMX Advanced...
Fast, Powerful and Scalable Analytics
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys

Similar to Erdi güngör bbs (20)

PDF
Customer value analysis of big data products
PPTX
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
PDF
Delivering fast, powerful and scalable analytics
PDF
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
PPTX
Empowering Customers with Personalized Insights
PPTX
finalestkddfinalpresentation-111207021040-phpapp01.pptx
PPTX
Ecommerce analytics with machine learning models.pptx
PDF
Experimenting with Data!
PPT
Retail Design
PDF
DataVard BW Fitness Test and HeatMap
PDF
A presentation for Retail Sales Projects
PPTX
Delivering fast, powerful and scalable analytics
PPTX
Unit 1.A.Introduction to Knowledge Discovery Data Mining (1).pptx
PDF
Generative AI In Logistics_Object Automation
PDF
RFP Presentation Example
PPT
SCM CRP ERP Decision Support
PPT
3._DWH_Architecture__Components.ppt
PPSX
Select Refresh For SAP PPT Show
PPT
Informix & IWA : Operational analytics performance
PDF
AP-Summary-Aug-09-2022_capabilities .pdf
Customer value analysis of big data products
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Delivering fast, powerful and scalable analytics
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
Empowering Customers with Personalized Insights
finalestkddfinalpresentation-111207021040-phpapp01.pptx
Ecommerce analytics with machine learning models.pptx
Experimenting with Data!
Retail Design
DataVard BW Fitness Test and HeatMap
A presentation for Retail Sales Projects
Delivering fast, powerful and scalable analytics
Unit 1.A.Introduction to Knowledge Discovery Data Mining (1).pptx
Generative AI In Logistics_Object Automation
RFP Presentation Example
SCM CRP ERP Decision Support
3._DWH_Architecture__Components.ppt
Select Refresh For SAP PPT Show
Informix & IWA : Operational analytics performance
AP-Summary-Aug-09-2022_capabilities .pdf
Ad

Recently uploaded (20)

PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Business Analytics and business intelligence.pdf
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
DOCX
Factor Analysis Word Document Presentation
PPTX
Managing Community Partner Relationships
PPT
Predictive modeling basics in data cleaning process
PDF
Microsoft Core Cloud Services powerpoint
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Analytics and business intelligence.pdf
CYBER SECURITY the Next Warefare Tactics
Acceptance and paychological effects of mandatory extra coach I classes.pptx
annual-report-2024-2025 original latest.
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
A Complete Guide to Streamlining Business Processes
Topic 5 Presentation 5 Lesson 5 Corporate Fin
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
IBA_Chapter_11_Slides_Final_Accessible.pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Factor Analysis Word Document Presentation
Managing Community Partner Relationships
Predictive modeling basics in data cleaning process
Microsoft Core Cloud Services powerpoint
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Ad

Erdi güngör bbs