Building a Predictive Model

BUILDING A
PREDICTIVE MODEL
AN EXAMPLE OF A PRODUCT
RECOMMENDATION ENGINE

Alex Lin
Senior Architect
Intelligent Mining
alin@intelligentmining.com

Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
method for dimensionality reduction
  Using a synthetic data set to test and
improve your model
  Experiment and results

2

The Business Problem
  Design
product recommender solution that will
increase revenue.

$$
3

How Do We Increase Revenue?

Increase
Conversion
Increase
Revenue Increase
Unit Price
Increase Avg.
Order Value
Increase
Units / Order

4

Example
  Is this recommendation effective?

Increase
Unit Price

Increase
Units / Order

5

What am I
going to do?

6

Predictive Model
  Framework

ML Prediction
Data Features
Algorithm Output

What data? What feature? Which Algorithm ? Cross-sell & Up-sell
Recommendation

7

What Data to Use?
  Explicit data
  Ratings
  Comments
  Implicit data
  Order history / Return history
  Cart events
  Page views
  Click-thru
  Search log
  In
today’s talk we only use Order history and Cart
events
8

Predictive Model

ML Prediction
Data Features
Algorithm Output

Order History What feature? Which Algorithm ? Cross-sell & Up-sell
Cart Events Recommendation

9

What Features to Use?
  We know that a given product tends to get
purchased by customers with similar tastes or
needs.
  Use user engagement data to describe a product.

users
1 2 3 4 5 6 7 8 9 10 … n
item

17 1 .25 .25 1 .25

user engagement vector

10

Data Representation / Features
  When we merge every item’s user engagement
vector, we got a m x n item-user matrix
users
1 2 3 4 5 6 7 8 9 10 … n

1 1 .25 1 .25

2 .25
items

3 1 .25 1
4 .25 1 .25 1

1 1
…

m

11

Data Normalization
  Ensurethe magnitudes of the entries in the
dataset matrix are appropriate
users
1 2 3 4 5 6 7 8 9 10 … n

1 1
.5 1
.9 1
.92 1
.49

2 1
.79
items

3 1
.67 1
.46 1
.73

4 1
.39 1
.82 1
.76 1
.69

1 1
…
…

.52 .8

m

  Removecolumn average – so frequent buyers
12
don’t dominate the model

Data Normalization
  Differentengagement data points (Order / Cart /
Page View) should have different weights
  Common normalization strategies:
  Remove column average
  Remove row average
  Remove global mean
  Z-score
  Fill-in the null values

13

Predictive Model

ML Prediction
Data Features
Algorithm Output

Order History User engagement Which Algorithm ? Cross-sell & Up-sell
Cart Events vector Recommendation

Data Normalization

14

Which Algorithm?
  How
do we find the items that have similar user
engagement data?
users
1 2 3 4 5 6 7 8 9 10 … n

1 1 .25 1 1
2 1
items

17 1 1 1 .25 .25

18 1 .25 1 1 1

1
…

.25

m

  We
can find the items that have similar user
15
engagement vectors with kNN algorithm

k-Nearest Neighbor (kNN)
  Find
the k items that have the most similar user
engagement vectors
users
1 2 3 4 5 6 7 8 9 10 … n

1 .5 1 1 1

2 1 .5 1
items

3 1 1 1 1

4 1 .5 1 1

.5 1
…

m 1 .5

  Nearest Neighbors of Item 4 = [2,3,1] 16

Similarity Measure for kNN
users
1 2 3 4 5 6 7 8 9 10 … n
items

2 1 .5 1

4 1 .5 1 1

  Jaccard coefficient:
(1+ 1)
sim(a,b) =
(1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)
  Cosine similarity:
a•b (1*1+ 0.5 *1)
sim(a,b) = cos(a,b) = =
€ a ∗ b (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 )
2 2

  Pearson Correlation:

€ corr(a,b) =
∑ (r − r )(r − r )
i ai a bi b
=
m∑ aibi − ∑ ai ∑ bi

∑ (r − r ) ∑ (r − r )
i ai a
2
i bi b
2
m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2
17
match _ cols * Dotprod(a,b) − sum(a) * sum(b)
=
match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2

k-Nearest Neighbor (kNN)
Item
feature
space Similarity Measure
(cosine similarity)
7
9

8 2
1
4
6

5

3

kNN k=5
Nearest Neighbors(8) = [9,6,3,1,2] 18

Predictive Model
  Ver. 1: kNN

ML Prediction
Data Features
Algorithm Output

Order History User engagement k-Nearest Neighbor Cross-sell & Up-sell
Cart Events vector (kNN) Recommendation

Data Normalization

19

Cosine Similarity – Code fragment
long i_cnt = 100000; // number of items 100K
long u_cnt = 2000000; // number of users 2M
double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation)
double norm[i_cnt];

// assume data matrix is loaded
……
// calculate vector norm for each user engagement vector
for (i=0; i<i_cnt; i++) {
norm[i] = 0;
for (f=0; f<u_cnt; f++) {
norm[i] += data[i][f] * data [i][f];
} 1. 100K rows x 100K rows x 2M features --> scalability problem
norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing,
}
MapReduce/Hadoop, Multicore/Threading, Stream Processors
// cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures
for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem
are
for (j=0; j<i_cnt; j++) { // loop thru 100K
dot_product = 0;
This leads us to The SVD dimensionality reduction !
for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M
dot_product += data[i][f] * data[j][f];
}
printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j]));
} 20
// find the Top K nearest neighbors here
…….

Singular Value Decomposition
(SVD)
A = U × S ×VT
A U S VT
m x n matrix m x r matrix r x r matrix r x n matrix

€
items

items

rank = k
k<r

users users
users
Ak = U k × Sk × VkT
  Low rank approx. Item profile is U k * Sk

items
  Low rank approx. User profile is S k *VkT 21

€   Low rank approx. Item-User matrix is
€ U k * Sk * Sk *VkT

€

Reduced SVD
Ak = U k × Sk × VkT
Ak Uk Sk VkT
100K x 2M matrix 100K x 3 matrix 3 x 3 matrix 3 x 2M matrix

7 0 0

0 3 0
items

items
0 0 1 users
rank = 3
Descending
Singular Values

users

  Low rank approx. Item profile is U k * Sk

22

€

SVD Factor Interpretation S
3 x 3 matrix

  Singular values plot (rank=512) 7 0 0

0 3 0

0 0 1

Descending
Singular Values

23
More Significant Latent Factors Noises + Others Less Significant

SVD Dimensionality Reduction

U k * Sk
<----- latent factors -----> # of users

€
items

3
rank
Need to find the most optimal low rank !!
10

24

Missing values

  Difference between “0” and “unknown”
  Missing values do NOT appear randomly.
  Value = (Preference Factors) + (Availability) – (Purchased
elsewhere) – (Navigation inefficiency) – etc.
  Approx. Value = (Preference Factors) +/- (Noise)
  Modeling missing values correctly will help us make good
recommendations, especially when working with an extremely
sparse data set

25

Singular Value Decomposition
(SVD)
  Use SVD to reduce dimensionality, so neighborhood
formation happens in reduced user space
  SVD helps model to find the low rank approx. dataset
matrix, while retaining the critical latent factors and
ignoring noise.
  Optimal low rank needs to be tuned
  SVD is computationally expensive

  SVD Libraries:
  Matlab [U, S, V] = svds(A,256);
  SVDPACKC http://www.netlib.org/svdpack/
  SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/
  GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html 26

Predictive Model
  Ver. 2: SVD+kNN

ML Prediction
Data Features
Algorithm Output

Order History User engagement k-Nearest Neighbors Cross-sell & Up-sell
Cart Events vector (kNN) in reduced Recommendation
space

Data Normalization

SVD

27

Synthetic Data Set
  Why do we use synthetic data set?

  Sowe can test our new model in a controlled
environment
28

Synthetic Data Set
  16latent factors synthetic e-commerce
data set
  Dimension: 1,000 (items) by 20,000 (users)
  16 user preference factors
  16 item property factors (non-negative)
  Txn Set: n = 55,360 sparsity = 99.72 %
  Txn+Cart Set: n = 192,985 sparsity = 99.03%
  Download: http://www.IntelligentMining.com/dataset/

user_id item_id type
10 42 0.25
10 997 0.25
10 950 0.25 29
11 836 0.25
11 225 1

Synthetic Data Set
Item property User preference Purchase Likelihood score
1K x 20K matrix
factors factors
1K x 16 matrix 16 x 20K matrix
X11 X12 X13 X14 X15 X16
x
X21 X22 X12 X24 X25 X26
y

items
X31 X32 X33 X34 X35 X36
a b c z
X41 X42 X43 X44 X45 X46

X51 X52 X53 X54 X55 X56

users

X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z

X32 = Likelihood of Item 3 being purchased by User 2
30

Synthetic Data Set
X11 X31 X51
Based on the distribution,
pre-determine # of items
X21 X41 purchased by an user X41
(# of item=2)
X31 Sort by Purchase X21 From the top, select and skip
X31
likelihood Score certain items to create data
X41 X51 sparsity. X21

X51 X11 X11

  User 1 purchased Item 4 and Item 1

31

Experiment Setup
  Each model (Random / kNN / SVD+kNN) will
generate top 20 recommendations for each item.
  Compare model output to the actual top 20
provided by synthetic data set
  Evaluation Metrics :
  Precision %: Overlapping of the top 20 between model
output and actual (higher the better)
{Found _ Top20 _ items} ∩ {Actual _ Top20 _ items}
Precision =
{Found _ Top20 _ items}

  Quality metric: Average of the actual ranking in the
model output (lower the better)
€ 32

1 2 30 47 50 21 1 2 368 62 900 510

Experimental Result
  kNN vs. Random (Control)

Precision % Quality
(higher is better) (Lower is better)

33

Experimental Result
  Precision % of SVD+kNN
Recall %
(higher is better)

Improvement

34
SVD Rank

Experimental Result
  Quality of SVD+kNN
Quality
(Lower is better)

Improvement

35
SVD Rank

Experimental Result
  The effect of using Cart data
Precision %
(higher is better)

36
SVD Rank

Experimental Result
  The effect of using Cart data
Quality
(Lower is better)

37
SVD Rank

Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
method for dimensionality reduction
  Using a synthetic data set to test and
improve your model
  Experiment and results

38

References
  J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of
Predictive Algorithms for Collaborative Filtering," in Proceedings of the
Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI
1998), 1998.
  B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative
filtering recommendation algorithms," in Proceedings of the Tenth
International Conference on the World Wide Web (WWW 10), pp. 285-295,
2001.
  B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of
Dimensionality Reduction in Recommender System A Case Study" In
ACM WebKDD 2000 Web Mining for E-Commerce Workshop
  Apache Lucene Mahout http://lucene.apache.org/mahout/
  Cofi: A Java-Based Collaborative Filtering Library
http://www.nongnu.org/cofi/

39

Thank you
  Any question or comment?

40

Building a Predictive Model

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Building a Predictive Model (20)

Recently uploaded (20)

Building a Predictive Model