SlideShare a Scribd company logo
BUILDING A
PREDICTIVE MODEL
AN EXAMPLE OF A PRODUCT
RECOMMENDATION ENGINE

Alex Lin
Senior Architect
Intelligent Mining
alin@intelligentmining.com
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            2
The Business Problem
  Design
        product recommender solution that will
 increase revenue.




               $$
                                            3
How Do We Increase Revenue?



               Increase
              Conversion
  Increase
  Revenue                      Increase
                              Unit Price
             Increase Avg.
              Order Value
                               Increase
                             Units / Order




                                             4
Example
  Is   this recommendation effective?


                        Increase
                       Unit Price



                        Increase
                      Units / Order




                                         5
What am I
going to do?




               6
Predictive Model
   Framework



                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



What data?      What feature?   Which Algorithm ?   Cross-sell & Up-sell
                                                    Recommendation




                                                                     7
What Data to Use?
  Explicit   data
       Ratings
       Comments
  Implicit   data
       Order history / Return history
       Cart events
       Page views
       Click-thru
       Search log
  In
    today’s talk we only use Order history and Cart
  events
                                                      8
Predictive Model


                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



Order History   What feature?   Which Algorithm ?   Cross-sell & Up-sell
Cart Events                                         Recommendation




                                                                     9
What Features to Use?
  We know that a given product tends to get
   purchased by customers with similar tastes or
   needs.
  Use user engagement data to describe a product.


                                  users
                1   2   3     4   5   6     7   8   9   10    …   n
    item




           17   1       .25           .25       1       .25


                         user engagement vector



                                                                      10
Data Representation / Features
  When we merge every item’s user engagement
 vector, we got a m x n item-user matrix
                                        users
              1   2     3     4     5   6   7     8     9   10    …   n

          1   1         .25             1                   .25

          2                                 .25
  items




          3   1               .25                 1
          4       .25               1             .25   1

                        1                   1
          …




          m



                                                                          11
Data Normalization
  Ensurethe magnitudes of the entries in the
 dataset matrix are appropriate
                                             users
               1     2     3     4     5     6     7     8     9     10    …   n

           1   1
               .5          1
                           .9                 1
                                             .92                      1
                                                                     .49

           2                                        1
                                                   .79
   items




           3    1
               .67                1
                                 .46                      1
                                                         .73

           4          1
                     .39                1
                                       .82                1
                                                         .76    1
                                                               .69

                            1                      1
           …
           …




                           .52                     .8

           m


  Removecolumn average – so frequent buyers
                                                                                   12
 don’t dominate the model
Data Normalization
  Differentengagement data points (Order / Cart /
   Page View) should have different weights
  Common normalization strategies:
      Remove column average
      Remove row average
      Remove global mean
      Z-score
      Fill-in the null values




                                                     13
Predictive Model


                                         ML                  Prediction
     Data          Features
                                      Algorithm               Output



Order History   User engagement      Which Algorithm ?   Cross-sell & Up-sell
Cart Events     vector                                   Recommendation



                Data Normalization




                                                                        14
Which Algorithm?
  How
     do we find the items that have similar user
 engagement data?
                                       users
               1   2   3     4   5      6   7   8     9   10    …   n

          1    1       .25              1                 1
          2                                 1
  items




          17   1             1          1       .25       .25

          18       1             .25    1       1     1

                                            1
          …




                       .25

          m


  We
    can find the items that have similar user
                                                                        15
 engagement vectors with kNN algorithm
k-Nearest Neighbor (kNN)
  Find
      the k items that have the most similar user
  engagement vectors
                                      users
              1     2   3    4   5     6   7    8   9    10   …   n

          1   .5        1              1                 1

          2         1                      .5            1
  items




          3   1              1                  1   1

          4         1            .5        1        1

                        .5                 1
          …




          m                  1                      .5



  Nearest         Neighbors of Item 4 = [2,3,1]                      16
Similarity Measure for kNN
                                                      users
                   1    2     3        4        5          6            7         8       9   10   …    n
       items




               2        1                                           .5                        1

               4        1                       .5                      1                 1

    Jaccard coefficient:
                                                                   (1+ 1)
                             sim(a,b) =
                                                     (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)
    Cosine similarity:
                                  a•b                                                         (1*1+ 0.5 *1)
           sim(a,b) = cos(a,b) =                                    =
               €                 a ∗ b                                          (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 )
                                  2                            2

    Pearson Correlation:

€         corr(a,b) =
                             ∑ (r − r )(r − r )
                                  i    ai       a     bi            b
                                                                                      =
                                                                                                   m∑ aibi − ∑ ai ∑ bi

                            ∑ (r − r ) ∑ (r − r )
                              i   ai        a
                                                2
                                                       i       bi           b
                                                                                 2
                                                                                          m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2
                                                                                                                                  17
                             match _ cols * Dotprod(a,b) − sum(a) * sum(b)
          =
               match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
k-Nearest Neighbor (kNN)
                                                       Item
feature
  space                                                Similarity Measure
                                                       (cosine similarity)
                                           7
                          9


              8                   2
          1
                                      4
                      6


                                               5

                  3




                              kNN k=5
                              Nearest Neighbors(8) = [9,6,3,1,2]       18
Predictive Model
   Ver.    1: kNN

                                              ML                   Prediction
     Data               Features
                                           Algorithm                Output



Order History        User engagement      k-Nearest Neighbor   Cross-sell & Up-sell
Cart Events          vector               (kNN)                Recommendation



                     Data Normalization




                                                                              19
Cosine Similarity – Code fragment
 long i_cnt = 100000; // number of items 100K
 long u_cnt = 2000000; // number of users 2M
 double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation)
 double norm[i_cnt];

 // assume data matrix is loaded
 ……
 // calculate vector norm for each user engagement vector
 for (i=0; i<i_cnt; i++) {
     norm[i] = 0;
     for (f=0; f<u_cnt; f++) {
        norm[i] += data[i][f] * data [i][f];
     }                               1. 100K rows x 100K rows x 2M features --> scalability problem
     norm[i] = sqrt(norm[i]);              kd-tree, Locality sensitive hashing,
 }
                                        MapReduce/Hadoop, Multicore/Threading, Stream Processors
 // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures
 for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem
                                        are
  for (j=0; j<i_cnt; j++) { // loop thru 100K
     dot_product = 0;
                                         This leads us to The SVD dimensionality reduction !
     for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M
         dot_product += data[i][f] * data[j][f];
     }
     printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j]));
   }                                                                                                      20
 // find the Top K nearest neighbors here
 …….
Singular Value Decomposition
            (SVD)
            A = U × S ×VT
                  A                             U                     S                     VT
             m x n matrix                   m x r matrix         r x r matrix           r x n matrix




€
    items




                                items



                                                                rank = k
                                                                k<r


               users                                                                      users
                                                                                            users
             Ak = U k × Sk × VkT
           Low rank approx. Item profile is        U k * Sk


                                                                                items
           Low rank approx. User profile is         S k *VkT                                          21

€          Low rank approx. Item-User matrix is
                                 €                         U k * Sk * Sk *VkT

                                        €
Reduced SVD
        Ak = U k × Sk × VkT
               Ak                          Uk                Sk                VkT
         100K x 2M matrix         100K x 3 matrix      3 x 3 matrix        3 x 2M matrix

                                                         7   0    0

                                                         0   3    0
items




                                   items
                                                         0   0    1              users
                                                         rank = 3
                                                                      Descending
                                                                      Singular Values


             users

       Low rank approx. Item profile is    U k * Sk

                                                                                           22

                              €
SVD Factor Interpretation                                  S
                                                     3 x 3 matrix

  Singular        values plot (rank=512)              7   0   0

                                                       0   3   0

                                                       0   0   1

                                                                    Descending
                                                                    Singular Values




                                                                                23
More Significant        Latent Factors   Noises + Others             Less Significant
SVD Dimensionality Reduction

                                  U k * Sk
        <----- latent factors ----->                     # of users


                        €
items




         3
               rank
                            Need to find the most optimal low rank !!
                10


                                                                        24
Missing values

    Difference between “0” and “unknown”
    Missing values do NOT appear randomly.
    Value = (Preference Factors) + (Availability) – (Purchased
     elsewhere) – (Navigation inefficiency) – etc.
    Approx. Value = (Preference Factors) +/- (Noise)
    Modeling missing values correctly will help us make good
     recommendations, especially when working with an extremely
     sparse data set




                                                                  25
Singular Value Decomposition
(SVD)
  Use SVD to reduce dimensionality, so neighborhood
   formation happens in reduced user space
  SVD helps model to find the low rank approx. dataset
   matrix, while retaining the critical latent factors and
   ignoring noise.
  Optimal low rank needs to be tuned
  SVD is computationally expensive


    SVD Libraries:
         Matlab [U, S, V] = svds(A,256);
         SVDPACKC http://www.netlib.org/svdpack/
         SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/
         GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html   26
Predictive Model
   Ver.    2: SVD+kNN

                                           ML                   Prediction
     Data           Features
                                        Algorithm                Output



Order History     User engagement     k-Nearest Neighbors   Cross-sell & Up-sell
Cart Events            vector         (kNN) in reduced      Recommendation
                                      space


                 Data Normalization

                       SVD


                                                                           27
Synthetic Data Set
  Why   do we use synthetic data set?




  Sowe can test our new model in a controlled
  environment
                                                 28
Synthetic Data Set
  16latent factors synthetic e-commerce
  data set
      Dimension: 1,000 (items) by 20,000 (users)
      16 user preference factors
      16 item property factors (non-negative)
      Txn Set: n = 55,360 sparsity = 99.72 %
      Txn+Cart Set: n = 192,985 sparsity = 99.03%
      Download: http://www.IntelligentMining.com/dataset/

       user_id   item_id   type
       10        42        0.25
       10        997       0.25
       10        950       0.25                              29
       11        836       0.25
       11        225       1
Synthetic Data Set
Item property      User preference             Purchase Likelihood score
                                                           1K x 20K matrix
   factors             factors
 1K x 16 matrix      16 x 20K matrix
                                                    X11   X12   X13   X14   X15   X16
                     x
                                                    X21   X22   X12   X24   X25   X26
                     y




                                            items
                                                    X31   X32   X33   X34   X35   X36
 a    b     c        z
                                                    X41   X42   X43   X44   X45   X46

                                                    X51   X52   X53   X54   X55   X56

                                                                 users

X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z

X32 = Likelihood of Item 3 being purchased by User 2
                                                                                        30
Synthetic Data Set
X11                        X31                                   X51
                                 Based on the distribution,
                                 pre-determine # of items
X21                        X41   purchased by an user            X41
                                 (# of item=2)
X31     Sort by Purchase   X21   From the top, select and skip
                                                                 X31
        likelihood Score         certain items to create data
X41                        X51   sparsity.                       X21

X51                        X11                                   X11



      User 1 purchased Item 4 and Item 1




                                                                       31
Experiment Setup
  Each model (Random / kNN / SVD+kNN) will
   generate top 20 recommendations for each item.
  Compare model output to the actual top 20
   provided by synthetic data set
  Evaluation Metrics :
      Precision %: Overlapping of the top 20 between model
       output and actual (higher the better)
                     {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items}
       Precision =
                                 {Found _ Top20 _ items}

      Quality metric: Average of the actual ranking in the
       model output (lower the better)
       €                                                                              32

         1   2   30      47   50   21              1     2   368   62     900   510
Experimental Result
     kNN            vs. Random (Control)

Precision %                           Quality
(higher is better)                    (Lower is better)




                                                          33
Experimental Result
    Precision       % of SVD+kNN
Recall %
(higher is better)




                                    Improvement




                                                     34
                                          SVD Rank
Experimental Result
      Quality      of SVD+kNN
Quality
(Lower is better)




                                 Improvement




                                                    35
                                         SVD Rank
Experimental Result
    The       effect of using Cart data
Precision %
(higher is better)




                                                      36
                                           SVD Rank
Experimental Result
  The       effect of using Cart data
Quality
(Lower is better)




                                                    37
                                         SVD Rank
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            38
References
    J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of
     Predictive Algorithms for Collaborative Filtering," in Proceedings of the
     Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI
     1998), 1998.
    B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative
     filtering recommendation algorithms," in Proceedings of the Tenth
     International Conference on the World Wide Web (WWW 10), pp. 285-295,
     2001.
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of
     Dimensionality Reduction in Recommender System A Case Study" In
     ACM WebKDD 2000 Web Mining for E-Commerce Workshop
    Apache Lucene Mahout http://lucene.apache.org/mahout/
    Cofi: A Java-Based Collaborative Filtering Library
     http://www.nongnu.org/cofi/


                                                                                 39
Thank you
  Any   question or comment?




                                40

More Related Content

PDF
Predictive Analytics: Advanced techniques in data mining
PDF
Predictive data analytics models and their applications
PDF
7 steps to Predictive Analytics
PDF
Barga Galvanize Sept 2015
PPTX
End-to-End Machine Learning Project
PDF
Barga ACM DEBS 2013 Keynote
PDF
Data Driven Engineering 2014
PPTX
Predictive analytics
Predictive Analytics: Advanced techniques in data mining
Predictive data analytics models and their applications
7 steps to Predictive Analytics
Barga Galvanize Sept 2015
End-to-End Machine Learning Project
Barga ACM DEBS 2013 Keynote
Data Driven Engineering 2014
Predictive analytics

What's hot (20)

PDF
Intro to machine learning
PDF
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
PPT
Excel Datamining Addin Advanced
PPTX
01 deloitte predictive analytics analytics summit-09-30-14_092514
PPTX
Analytics for offline retail: Using Advanced Machine Learning
PDF
Barga Data Science lecture 6
PPTX
Data Science Training | Data Science For Beginners | Data Science With Python...
PPTX
The 8 Step Data Mining Process
PDF
IRJET- Stock Market Prediction using Machine Learning Techniques
PPTX
Predictive analytics and big data tutorial
PDF
REAL-TIME RECOMMENDATION SYSTEMS
PDF
Deep Learning for Recommender Systems
PDF
Managing machine learning
PDF
Barga Data Science lecture 9
PDF
Fairly Measuring Fairness In Machine Learning
PDF
Machine Learning for Dummies
PDF
Barga Data Science lecture 5
PDF
“Improving” prediction of human behavior using behavior modification
DOCX
modeling and predicting cyber hacking breaches
PPTX
Credit Card Fraud Detection - Anomaly Detection
Intro to machine learning
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
Excel Datamining Addin Advanced
01 deloitte predictive analytics analytics summit-09-30-14_092514
Analytics for offline retail: Using Advanced Machine Learning
Barga Data Science lecture 6
Data Science Training | Data Science For Beginners | Data Science With Python...
The 8 Step Data Mining Process
IRJET- Stock Market Prediction using Machine Learning Techniques
Predictive analytics and big data tutorial
REAL-TIME RECOMMENDATION SYSTEMS
Deep Learning for Recommender Systems
Managing machine learning
Barga Data Science lecture 9
Fairly Measuring Fairness In Machine Learning
Machine Learning for Dummies
Barga Data Science lecture 5
“Improving” prediction of human behavior using behavior modification
modeling and predicting cyber hacking breaches
Credit Card Fraud Detection - Anomaly Detection
Ad

Viewers also liked (20)

PDF
Predictive analytics in action real-world examples and advice
PDF
Predictive Analytics using R
PPTX
Predictive Analytics - An Overview
PPT
Introduction To Predictive Analytics Part I
PPTX
Predictive Analytics: Context and Use Cases
PDF
From Business Intelligence to Predictive Analytics
PDF
The Future of Personalized Health Care: Predictive Analytics by @Rock_Health
PPTX
Three Approaches to Predictive Analytics in Healthcare
PPTX
From BI to Predictive Analytics
PDF
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
PDF
Introduction to Matrix Factorization Methods Collaborative Filtering
PPTX
Introduction to Machine Learning
PPTX
Big data ppt
PPT
Normalization_BCA_
PPT
LieDM asociaicjos parama institucijoms
PDF
Loss Reserving
PPTX
Predictive Analytics -Workshop
PPTX
Predictive Marketing using Google Analytics
PPSX
Stochastic Loss Reserving-General Insurance
Predictive analytics in action real-world examples and advice
Predictive Analytics using R
Predictive Analytics - An Overview
Introduction To Predictive Analytics Part I
Predictive Analytics: Context and Use Cases
From Business Intelligence to Predictive Analytics
The Future of Personalized Health Care: Predictive Analytics by @Rock_Health
Three Approaches to Predictive Analytics in Healthcare
From BI to Predictive Analytics
Strata 2013: Tutorial-- How to Create Predictive Models in R using Ensembles
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Machine Learning
Big data ppt
Normalization_BCA_
LieDM asociaicjos parama institucijoms
Loss Reserving
Predictive Analytics -Workshop
Predictive Marketing using Google Analytics
Stochastic Loss Reserving-General Insurance
Ad

Similar to Building a Predictive Model (20)

PPTX
Fundamentals Of Data Mining 2010
PPTX
Fundamentals Of Data Mining 2010
PPTX
Fundamentals Of Data Mining 2010
PPTX
Fundamentals Of Data Mining 2010
PDF
Renuka iim trichy strategy for mu martv
PPT
Data miningppt378
PDF
Recommendation Engine Demystified
PDF
Recommendation Engine Demystified
PPTX
Is Your Marketing Database "Model Ready"?
PDF
Recommendations play @flipkart
PPTX
Real-time Big Data Analytics: From Deployment to Production
PPTX
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
PPTX
Is Your Marketing Database "Model Ready"?
PPT
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
PPTX
Predictive Analytics for Cars.com
PPTX
Optimising digital content delivery
PDF
Matrix Factorization Techniques For Recommender Systems
PPTX
BIG MART SALES.pptx
PPTX
BIG MART SALES PRIDICTION PROJECT.pptx
PDF
Recommender Systems from A to Z – Model Evaluation
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Renuka iim trichy strategy for mu martv
Data miningppt378
Recommendation Engine Demystified
Recommendation Engine Demystified
Is Your Marketing Database "Model Ready"?
Recommendations play @flipkart
Real-time Big Data Analytics: From Deployment to Production
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
Is Your Marketing Database "Model Ready"?
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Predictive Analytics for Cars.com
Optimising digital content delivery
Matrix Factorization Techniques For Recommender Systems
BIG MART SALES.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
Recommender Systems from A to Z – Model Evaluation

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PPTX
Spectroscopy.pptx food analysis technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
Spectroscopy.pptx food analysis technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
MIND Revenue Release Quarter 2 2025 Press Release
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Weekly Chronicles - August'25 Week I

Building a Predictive Model

  • 1. BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com
  • 2. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 2
  • 3. The Business Problem   Design product recommender solution that will increase revenue. $$ 3
  • 4. How Do We Increase Revenue? Increase Conversion Increase Revenue Increase Unit Price Increase Avg. Order Value Increase Units / Order 4
  • 5. Example   Is this recommendation effective? Increase Unit Price Increase Units / Order 5
  • 6. What am I going to do? 6
  • 7. Predictive Model   Framework ML Prediction Data Features Algorithm Output What data? What feature? Which Algorithm ? Cross-sell & Up-sell Recommendation 7
  • 8. What Data to Use?   Explicit data   Ratings   Comments   Implicit data   Order history / Return history   Cart events   Page views   Click-thru   Search log   In today’s talk we only use Order history and Cart events 8
  • 9. Predictive Model ML Prediction Data Features Algorithm Output Order History What feature? Which Algorithm ? Cross-sell & Up-sell Cart Events Recommendation 9
  • 10. What Features to Use?   We know that a given product tends to get purchased by customers with similar tastes or needs.   Use user engagement data to describe a product. users 1 2 3 4 5 6 7 8 9 10 … n item 17 1 .25 .25 1 .25 user engagement vector 10
  • 11. Data Representation / Features   When we merge every item’s user engagement vector, we got a m x n item-user matrix users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 .25 2 .25 items 3 1 .25 1 4 .25 1 .25 1 1 1 … m 11
  • 12. Data Normalization   Ensurethe magnitudes of the entries in the dataset matrix are appropriate users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .5 1 .9 1 .92 1 .49 2 1 .79 items 3 1 .67 1 .46 1 .73 4 1 .39 1 .82 1 .76 1 .69 1 1 … … .52 .8 m   Removecolumn average – so frequent buyers 12 don’t dominate the model
  • 13. Data Normalization   Differentengagement data points (Order / Cart / Page View) should have different weights   Common normalization strategies:   Remove column average   Remove row average   Remove global mean   Z-score   Fill-in the null values 13
  • 14. Predictive Model ML Prediction Data Features Algorithm Output Order History User engagement Which Algorithm ? Cross-sell & Up-sell Cart Events vector Recommendation Data Normalization 14
  • 15. Which Algorithm?   How do we find the items that have similar user engagement data? users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 1 2 1 items 17 1 1 1 .25 .25 18 1 .25 1 1 1 1 … .25 m   We can find the items that have similar user 15 engagement vectors with kNN algorithm
  • 16. k-Nearest Neighbor (kNN)   Find the k items that have the most similar user engagement vectors users 1 2 3 4 5 6 7 8 9 10 … n 1 .5 1 1 1 2 1 .5 1 items 3 1 1 1 1 4 1 .5 1 1 .5 1 … m 1 .5   Nearest Neighbors of Item 4 = [2,3,1] 16
  • 17. Similarity Measure for kNN users 1 2 3 4 5 6 7 8 9 10 … n items 2 1 .5 1 4 1 .5 1 1   Jaccard coefficient: (1+ 1) sim(a,b) = (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)   Cosine similarity: a•b (1*1+ 0.5 *1) sim(a,b) = cos(a,b) = = € a ∗ b (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 ) 2 2   Pearson Correlation: € corr(a,b) = ∑ (r − r )(r − r ) i ai a bi b = m∑ aibi − ∑ ai ∑ bi ∑ (r − r ) ∑ (r − r ) i ai a 2 i bi b 2 m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2 17 match _ cols * Dotprod(a,b) − sum(a) * sum(b) = match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
  • 18. k-Nearest Neighbor (kNN) Item feature space Similarity Measure (cosine similarity) 7 9 8 2 1 4 6 5 3 kNN k=5 Nearest Neighbors(8) = [9,6,3,1,2] 18
  • 19. Predictive Model   Ver. 1: kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbor Cross-sell & Up-sell Cart Events vector (kNN) Recommendation Data Normalization 19
  • 20. Cosine Similarity – Code fragment long i_cnt = 100000; // number of items 100K long u_cnt = 2000000; // number of users 2M double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation) double norm[i_cnt]; // assume data matrix is loaded …… // calculate vector norm for each user engagement vector for (i=0; i<i_cnt; i++) { norm[i] = 0; for (f=0; f<u_cnt; f++) { norm[i] += data[i][f] * data [i][f]; } 1. 100K rows x 100K rows x 2M features --> scalability problem norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing, } MapReduce/Hadoop, Multicore/Threading, Stream Processors // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem are for (j=0; j<i_cnt; j++) { // loop thru 100K dot_product = 0; This leads us to The SVD dimensionality reduction ! for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M dot_product += data[i][f] * data[j][f]; } printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j])); } 20 // find the Top K nearest neighbors here …….
  • 21. Singular Value Decomposition (SVD) A = U × S ×VT A U S VT m x n matrix m x r matrix r x r matrix r x n matrix € items items rank = k k<r users users users Ak = U k × Sk × VkT   Low rank approx. Item profile is U k * Sk items   Low rank approx. User profile is S k *VkT 21 €   Low rank approx. Item-User matrix is € U k * Sk * Sk *VkT €
  • 22. Reduced SVD Ak = U k × Sk × VkT Ak Uk Sk VkT 100K x 2M matrix 100K x 3 matrix 3 x 3 matrix 3 x 2M matrix 7 0 0 0 3 0 items items 0 0 1 users rank = 3 Descending Singular Values users   Low rank approx. Item profile is U k * Sk 22 €
  • 23. SVD Factor Interpretation S 3 x 3 matrix   Singular values plot (rank=512) 7 0 0 0 3 0 0 0 1 Descending Singular Values 23 More Significant Latent Factors Noises + Others Less Significant
  • 24. SVD Dimensionality Reduction U k * Sk <----- latent factors -----> # of users € items 3 rank Need to find the most optimal low rank !! 10 24
  • 25. Missing values   Difference between “0” and “unknown”   Missing values do NOT appear randomly.   Value = (Preference Factors) + (Availability) – (Purchased elsewhere) – (Navigation inefficiency) – etc.   Approx. Value = (Preference Factors) +/- (Noise)   Modeling missing values correctly will help us make good recommendations, especially when working with an extremely sparse data set 25
  • 26. Singular Value Decomposition (SVD)   Use SVD to reduce dimensionality, so neighborhood formation happens in reduced user space   SVD helps model to find the low rank approx. dataset matrix, while retaining the critical latent factors and ignoring noise.   Optimal low rank needs to be tuned   SVD is computationally expensive   SVD Libraries:   Matlab [U, S, V] = svds(A,256);   SVDPACKC http://www.netlib.org/svdpack/   SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/   GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html 26
  • 27. Predictive Model   Ver. 2: SVD+kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbors Cross-sell & Up-sell Cart Events vector (kNN) in reduced Recommendation space Data Normalization SVD 27
  • 28. Synthetic Data Set   Why do we use synthetic data set?   Sowe can test our new model in a controlled environment 28
  • 29. Synthetic Data Set   16latent factors synthetic e-commerce data set   Dimension: 1,000 (items) by 20,000 (users)   16 user preference factors   16 item property factors (non-negative)   Txn Set: n = 55,360 sparsity = 99.72 %   Txn+Cart Set: n = 192,985 sparsity = 99.03%   Download: http://www.IntelligentMining.com/dataset/ user_id item_id type 10 42 0.25 10 997 0.25 10 950 0.25 29 11 836 0.25 11 225 1
  • 30. Synthetic Data Set Item property User preference Purchase Likelihood score 1K x 20K matrix factors factors 1K x 16 matrix 16 x 20K matrix X11 X12 X13 X14 X15 X16 x X21 X22 X12 X24 X25 X26 y items X31 X32 X33 X34 X35 X36 a b c z X41 X42 X43 X44 X45 X46 X51 X52 X53 X54 X55 X56 users X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z X32 = Likelihood of Item 3 being purchased by User 2 30
  • 31. Synthetic Data Set X11 X31 X51 Based on the distribution, pre-determine # of items X21 X41 purchased by an user X41 (# of item=2) X31 Sort by Purchase X21 From the top, select and skip X31 likelihood Score certain items to create data X41 X51 sparsity. X21 X51 X11 X11   User 1 purchased Item 4 and Item 1 31
  • 32. Experiment Setup   Each model (Random / kNN / SVD+kNN) will generate top 20 recommendations for each item.   Compare model output to the actual top 20 provided by synthetic data set   Evaluation Metrics :   Precision %: Overlapping of the top 20 between model output and actual (higher the better) {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items} Precision = {Found _ Top20 _ items}   Quality metric: Average of the actual ranking in the model output (lower the better) € 32 1 2 30 47 50 21 1 2 368 62 900 510
  • 33. Experimental Result   kNN vs. Random (Control) Precision % Quality (higher is better) (Lower is better) 33
  • 34. Experimental Result   Precision % of SVD+kNN Recall % (higher is better) Improvement 34 SVD Rank
  • 35. Experimental Result   Quality of SVD+kNN Quality (Lower is better) Improvement 35 SVD Rank
  • 36. Experimental Result   The effect of using Cart data Precision % (higher is better) 36 SVD Rank
  • 37. Experimental Result   The effect of using Cart data Quality (Lower is better) 37 SVD Rank
  • 38. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 38
  • 39. References   J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in Proceedings of the Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI 1998), 1998.   B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the Tenth International Conference on the World Wide Web (WWW 10), pp. 285-295, 2001.   B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of Dimensionality Reduction in Recommender System A Case Study" In ACM WebKDD 2000 Web Mining for E-Commerce Workshop   Apache Lucene Mahout http://lucene.apache.org/mahout/   Cofi: A Java-Based Collaborative Filtering Library http://www.nongnu.org/cofi/ 39
  • 40. Thank you   Any question or comment? 40