Recommendation Engine with In-Database Machine Learning

Recommendation Engine with
In-Database Machine Learning
Changran Liu, Mingxi Wu

| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Outline
● In-database model training
● Latent factor recommendation
model
● Distributed model training in Graph
● Demo
● GSQL implementation

Traditional Model Training Pipeline
training data
model
request
results
Database:
● data storage
● data update
● preprocess data
Machine learning platform
● model training
● model validation
Applications:
● place order
● recommendation
● ...

In-situ machine learning in database:
● No need for exporting data
● Better support continuous model training over evolving data
● Less limitation on model size
● Support distributed model training
In-Database Model Training
request
results
Applications:
● recommendation
● fraud detection
● ...

Movie Recommendation
movie features
users ratings
Goals:
● Predict users' ratings for movies they haven't
seen, based on previous ratings
● Recommend movies to users based on rating
prediction

Movie Rating Prediction (Latent factors model)
Movie Alice Bob Carol Dave
Love at last 5 5 0 0
Romance forever 5 ? ? 0
Cute puppies of love ? 4 0 ?
Toy story ? ? ? 5
Sword vs. karate 0 0 5 ?
Nonstop car chases 0 0 5 4
● Each movie has a latent
factor vector: θ(j)
● Each user has a latent
factor vector: x(i)
● Predict the user j’s rating
to movie i by: (θ(j)
)T
x(i)
θ(1)
= [5, 0] θ(2)
= [5, 0] θ(3)
= [0, 5] θ(4)
= [0, 5]
x(1)
= [0.9, 0]
x(2)
= [1, 0.1]
x(3)
= [0.9, 0]
x(4)
= [0.1, 1]
x(5)
= [0.1, 1]
x(6)
= [0, 0.9]
4.5
5
4.5
0.5
0.5
0

Movie Rating Prediction (Latent factors model)
Movie Alice Bob Carol Dave
Love at last 5 5 0 0
Romance forever 5 ? ? 0
Cute puppies of love ? 4 0 ?
Toy story ? ? ? 5
Sword vs. karate 0 0 5 ?
Nonstop car chases 0 0 5 4
θ(1)
= [5, 0]
● Each movie has a latent
factor vector: θ(j)
● Each user has a latent
factor vector: x(i)
● Predict the user j’s rating
to movie i by: (θ(j)
)T
x(i)
θ(2)
= [5, 0] θ(3)
= [0, 5] θ(4)
= [0, 5]
x(1)
= [0.9, 0]
x(2)
= [1, 0.1]
x(3)
= [0.9, 0]
x(4)
= [0.1, 1]
x(5)
= [0.1, 1]
x(6)
= [0, 0.9]
action
romance
4.5
5
4.5
0.5
0.5
0

Cost Function
RMSE regularization
User 2
Movie 1
Movie 2
Movie 3
User 1
rating: y
(1,1)
rating: y (1,2)
rating: y
(2,2)
rating: y (2,3)
θ(1)
θ(2)
x(1)
x(2)
x(3)

Model Training in Graph
User 2
Movie 1
Movie 2
Movie 3
User 1
rating: y(1,1)
rating: y (1,2)
rating: y
(2,2)
rating: y(2,3)
θ(1)
θ(2)
x(1)
x(2)
x(3)
Phase 1:
● Collect x(i)
, y(i,j)
from the movies that each user
rated

User 2
Movie 1
Movie 2
Movie 3
User 1
rating: y(1,1)
rating: y (1,2)
rating: y
(2,2)
rating: y(2,3)
[θ(1)
, x(1)
, y(1,1)
]
[θ(1)
, x(2)
, y(1,2)
]
x(1)
x(2)
x(3)
[θ(2)
, x(2)
, y(2,2)
]
[θ(2)
, x(3)
, y(2,3)
]
Phase 1:
● Collect x(i)
, y(i,j)
rated

User 2
Movie 1
Movie 2
Movie 3
User 1
rating: y(1,1)
rating: y (1,2)
rating: y
(2,2)
rating: y(2,3)
[θ(1)
, x(1)
, y(1,1)
] →( (θ(1)
)T
x(1)
-y(1,1)
) x(1)
[θ(1)
, x(2)
, y(1,2)
] →( (θ(1)
)T
x(2)
-y(1,1)
) x(2)
x(1)
x(2)
x(3)
[θ(2)
, x(2)
, y(2,2)
] → ((θ(2)
)T
x(2)
-y(2,2)
) x(2)
[θ(2)
, x(3)
, y(2,3)
] → ((θ(2)
)T
x(3)
-y(2,3)
) x(3)
Phase 1:
● Collect x(i)
, y(i,j)
rated
● Compute the gradient contributed by each
movies

User 2
Movie 1
Movie 2
Movie 3
User 1
rating: y(1,1)
rating: y (1,2)
rating: y
(2,2)
rating: y(2,3)
( (θ(1)
)T
x(1)
-y(1,1)
) x(1)
+ ( (θ(1)
)T
x(2)
-y(1,1)
) x(2)
x(1)
x(2)
x(3)
((θ(2)
)T
x(2)
-y(2,2)
) x(2)
+ ((θ(2)
)T
x(3)
-y(2,3)
) x(3)
Phase 1:
● Collect x(i)
, y(i,j)
rated
movies
Phase 2:
● Aggregate the gradient

User 2
Movie 1
Movie 2
Movie 3
User 1
rating: y(1,1)
rating: y (1,2)
rating: y
(2,2)
rating: y(2,3)
x(1)
x(2)
x(3)
Phase 1:
● Collect x(i)
, y(i,j)
rated
movies
Phase 2:
● Aggregate the gradient
● Update the feature vector using gradient descent

Training
Split data
Initialize latent factor
vectors
diff. between prediction and
label
converged?
no
finish
yes
update latent vectors
using gradient descent
(splitData.gsql)
(initialization.gsql)
(training.gsql)

Demo

GSQL Training Block
USERs = SELECT s FROM USERs:s -(rate:e)-> MOVIE:t
ACCUM
DOUBLE prediction = dotProduct(s.@theta,t.@x),
DOUBLE delta = prediction-e.rating,
s.@Gradient += product(t.@x,delta),
t.@Gradient += product(s.@theta,delta)
POST-ACCUM
s.@theta += product(s.@Gradient,-alpha),
t.@x += product(t.@Gradient,-alpha);
Dave
Romance
forever
Love at
last
Nonstop
car chases
Alice
rating: 5
rating: 5
rating: 0
rating: 4
θ = [1.5, 1.7]
θ = [1.0, 1.5]
x = [2.0, 2.3]
x = [2.0, 1.3]
x = [1.0, 1.3]

GSQL Training Block
ACCUM
POST-ACCUM
Dave
Romance
forever
Love at
last
Nonstop
car chases
Alice
rating: 5
rating: 5
rating: 0
rating: 4
θ = [1.5, 1.7]
θ = [1.0, 1.5]
x = [2.0, 2.3]
x = [2.0, 1.3]
x = [1.0, 1.3]
prediction: 6.9
prediction: 5.2
prediction: 4.0
prediction: 3.0

GSQL Training Block
ACCUM
POST-ACCUM
Dave
Romance
forever
Love at
last
Nonstop
car chases
Alice
ẟ: 1.9
ẟ: 0.2
ẟ: 4.0
ẟ: -1.1
θ = [1.5, 1.7]
θ = [1.0, 1.5]
x = [2.0, 2.3]
x = [2.0, 1.3]
x = [1.0, 1.3]

GSQL Training Block
ACCUM
POST-ACCUM
Dave
Romance
forever
Love at
last
Nonstop
car chases
Alice
ẟ: 1.9
ẟ: 0.2
ẟ: 4.0
ẟ: -1.1
θ = [1.5, 1.7]
grad(θ) = [4.2, 4.7]
θ = [1.0, 1.5]
grad(θ) = [6.9, 3.8]
x = [2.0, 2.3]
x = [2.0, 1.3]
x = [1.0, 1.3]

GSQL Training Block
ACCUM
POST-ACCUM
Dave
Romance
forever
Love at
last
Nonstop
car chases
Alice
ẟ: 1.9
ẟ: 0.2
ẟ: 4.0
ẟ: -1.1
θ = [1.5, 1.7]
grad(θ) = [4.2, 4.7]
θ = [1.0, 1.5]
grad(θ) = [6.9, 3.8]
x = [2.0, 2.3]
grad(x) = [2.9, 3.2]
x = [2.0, 1.3]
grad(x) = [4.3, 6.3]
x = [1.0, 1.3]
grad(x) = [-1.1, -1.6]

GSQL Training Block
ACCUM
POST-ACCUM
Dave
Romance
forever
Love at
last
Nonstop
car chases
Alice
ẟ: 1.9
ẟ: 0.2
ẟ: 4.0
ẟ: -1.1
θ = [1.5, 1.7]
θ’ = [1.46, 1.65]
θ = [1.0, 1.5]
θ’ = [0.93, 1.46]
x = [2.0, 2.3]
x’ = [1.97, 2.27]
x = [2.0, 1.3]
x’ = [1.96, 1.24]
x = [1.0, 1.3]
x’ = [1.01, 1.32]
* alpha = 0.01

GSQL Training Block
ACCUM
POST-ACCUM
Dave
Romance
forever
Love at
last
Nonstop
car chases
Alice
ẟ: 1.9
ẟ’: 1.6
ẟ: 0.2
ẟ’: -0.1
ẟ: 4.0
ẟ’: 3.6
ẟ: -1.1
ẟ’: -1.1
θ = [1.5, 1.7]
θ’ = [1.46, 1.65]
θ = [1.0, 1.5]
θ’ = [0.93, 1.46]
x = [2.0, 2.3]
x’ = [1.97, 2.27]
x = [2.0, 1.3]
x’ = [1.96, 1.24]
x = [1.0, 1.3]
x’ = [1.01, 1.32]

User-Rate-Movie Graph
● Content based method
Toy story
● Disney
● ...
Iron man
● Marvel
● Action
● ...
Alice
● Disney fan
● Marvel fan
● ...
Bob
● Marvel fan
● ...
rating: 5
rating: 5
rating:4.5
rating:?

Toy story
● Disney
● ...
Iron man
● Marvel
● Action
● ...
Alice
● Disney fan
● Marvel fan
● ...
Bob
● Marvel fan
● ...
rating: 5
rating: 5
rating:4.5
rating:?
● K-nearest neighbors

Toy story
● Disney
● ...
Iron man
● Marvel
● Action
● ...
Alice
● Disney fan
● Marvel fan
● ...
Bob
● Marvel fan
● ...
rating: 5
rating: 5
rating:?
● Latent factor (model-based)

● Latent factor (model-based)
● Hybrid method
● ...
Toy story
● Disney
● ...
Iron man
● Marvel
● Action
● ...
Alice
● Disney fan
● Marvel fan
● ...
Bob
● Marvel fan
● ...
rating: 5
rating: 5
rating:?

Recommendation Engine with In-Database Machine Learning

More Related Content

What's hot (20)

Similar to Recommendation Engine with In-Database Machine Learning (20)

More from TigerGraph (20)

Recently uploaded (20)

Recommendation Engine with In-Database Machine Learning