[UMAP2013] Recommendation with Differential Context Weighting

Recommendation with
Differential Context Weighting
Recommendation with
Yong Zheng
Robin Burke
Bamshad Mobasher
Center for Web Intelligence
DePaul University
Chicago, IL USA
Yong Zheng
Robin Burke
Bamshad Mobasher
Center for Web Intelligence
DePaul University
Chicago, IL USA
Conference on UMAP
June 12, 2013

Overview
• Introduction (RS and Context-aware RS)
• Sparsity of Contexts and Relevant Solutions
• Differential Context Relaxation & Weighting
• Experimental Results
• Conclusion and Future Work

Introduction
• Recommender Systems
• Context-aware Recommender Systems

Recommender Systems (RS)
• Information Overload  Recommendations

Context-aware RS (CARS)
• Traditional RS: Users × Items  Ratings
• Context-aware RS: Users × Items × Contexts Ratings
Companion
Example of Contexts in different domains:
 Food: time (lunch, dinner), occasion (business lunch, family dinner)
 Movie: time (weekend, weekday), location (home, cinema), etc
 Music: time (morning, evening), activity (study, sports, party), etc
 Book: a book as a gift for kids or mother, etc
Recommendation cannot live alone without considering contexts.

Research Problems
• Sparsity of Contexts
• Relevant Solutions

Sparsity of Contexts
• Assumption of Context-aware RS: It is better to use
preferences in the same contexts for predictions in
recommender systems.
• Same contexts? How about multiple contexts & sparsity?
An example in the movie domain:
Are there rating profiles in the contexts <Weekday, Home, Sister>?
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?

Relevant Solutions
Context Matching  The same contexts <Weekday, Home, Sister>?
1.Context Selection  Use the influential dimensions only
2.Context Relaxation  Use a relaxed set of dimensions, e.g. time
3.Context Weighting  We can use all dimensions, but measure how
similar the contexts are! (to be continued later)
Differences between context selection and context relaxation:
 Context selection is conducted by surveys or statistics;
 Context relaxation is directly towards optimization on predictions;
 Optimal context relaxation/weighting is a learning process!

DCR and DCW
• Differential Context Relaxation (DCR)
• Differential Context Weighting (DCW)
• Particle Swarm Intelligence as Optimizer

Differential Context Relaxation
Differential Context Relaxation (DCR) is our first attempt to alleviate
the sparsity of contexts, and differential context weighting (DCW) is a
finer-grained improvement over DCR.
• There are two notion in DCR
 “Differential” Part  Algorithm Decomposition
 Separate one algorithm into different functional components;
 Apply appropriate context constraints to each component;
 Maximize the global contextual effects together;
 “Relaxation” Part  Context Relaxation
We use a set of relaxed dimensions instead of all of them.
• References
 Y. Zheng, R. Burke, B. Mobasher. "Differential Context Relaxation for Context-aware
Travel Recommendation". In EC-WEB, 2012
 Y. Zheng, R. Burke, B. Mobasher. "Optimal Feature Selection for Context-Aware
Recommendation using Differential Relaxation". In RecSys Workshop on CARS, 2012

DCR – Algorithm Decomposition
Take User-based Collaborative Filtering (UBCF) for example.
Pirates of the
Caribbean 4
Kung Fu Panda
2
Harry Potter
6
Harry Potter
7
U1 4 4 2 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
Standard Process in UBCF (Top-K UserKNN, K=1 for example):
1). Find neighbors based on user-user similarity
2). Aggregate neighbors’ contribution
3). Make final predictions

DCR – Algorithm Decomposition
Take User-based Collaborative Filtering (UBCF) for example.
1.Neighbor Selection 2.Neighbor contribution
3.User baseline 4.User Similarity
All components contribute to the final predictions, where
we assume appropriate contextual constraints can leverage the
contextual effect in each algorithm component.
e.g. use neighbors who rated in same contexts.

DCR – Context Relaxation
Notion of Context Relaxation:
• Use {Time, Location, Companion}  0 record matched!
• Use {Time, Location}  1 record matched!
• Use {Time}  2 records matched!
In DCR, we choose appropriate context relaxation for each component.
# of matched ratings best performances & least noises
Balance

DCR – Context Relaxation
2.Neighbor contribution
1.Neighbor Selection
c is the original contexts, e.g. <Weekday, Home, Sister>
C1, C2, C3, C4 are the relaxed contexts.
The selection is modeled by a binary vector.
E.g. <1, 0, 0> denotes we just selected the first context dimension
Take neighbor selection for example:
Originally select neighbors by users who rated the same item.
DCR further filter those neighbors by contextual constraint C1
i.e.. C1 = <1,0,0>  Time=Weekday u must rated i on weekdays

DCR – Drawbacks
2.Neighbor contribution
1.Neighbor Selection
1. Context relaxation is still strict, especially when data is sparse.
2. Components are dependent. For example, neighbor contribution is
dependent with neighbor selection. E.g. neighbors are selected by
C1: Location = Cinema, it is not guaranteed, neighbor has ratings
under contexts C2: Time = Weekend
A finer-grained solution is required!!  Differential Context Weighting

 Goal: Use all dimensions, but we measure the similarity of contexts.
 Assumption: More similar two contexts are given, the ratings may be
more useful for calculations in predictions.
c and d are two contexts. (Two red regions in the Table above.)
σ is the weighting vector <w1, w2, w3> for three dimensions.
Assume they are equal weights, w1 = w2 = w3 = 1.
J(c, d, σ) = # of matched dimensions / # of all dimensions = 2/3
Similarity of contexts is measured by
Weighted Jaccard similarity

3.User baseline
4.User Similarity
2.Neighbor contribution1.Neighbor Selection
1.“Differential” part  Components are all the same as in DCR.
2.“Context Weighting” part (for each individual component):
 σ is the weighting vector
 ϵ is a threshold for the similarity of contexts.
i.e., only records with similar enough (≥ ϵ) contexts can be included.
3.In calculations, similarity of contexts are the weights, for example
2.Neighbor
contribution
It is similar calculation for the other components.

Particle Swarm Optimization (PSO)
The remaining work is to find optimal context relaxation vectors for
DCR and context weighting vectors for DCW. PSO is derived from
swarm intelligence which helps achieve a goal by collaborative
Fish Birds Bees
Why PSO?
1). Easy to implement as a non-linear optimizer;
2). Has been used in weighted CF before, and was demonstrated
to work better than other non-linear optimizer, e.g. genetic algorithm;
3). Our previous work successfully applied BPSO for DCR;

Particle Swarm Optimization (PSO)
Swarm = a group of birds
Particle = each bird ≈ each run in algorithm
Vector = bird’s position in the space ≈ Vectors we need
Goal = the location of pizza ≈ Lower prediction error
So, how to find goal by swam?
1.Looking for the pizza
Assume a machine can tell the distance
2.Each iteration is an attempt or move
3.Cognitive learning from particle itself
Am I closer to the pizza comparing with
my “best ”locations in previous history?
4.Social Learning from the swarm
Hey, my distance is 1 mile. It is the closest!
. Follow me!! Then other birds move towards here.
DCR – Feature selection – Modeled by binary vectors – Binary PSO
DCW – Feature weighting – Modeled by real-number vectors – PSO
How it works? Take DCR and Binary PSO for example:
Assume there are 4 components and 3 contextual dimensions
Thus there are 4 binary vectors for each component respectively
We merge the vectors into a single one, the vector size is 3*4 = 12
This single vector is the particle’s position vector in PSO process.

Experimental Results
• Data Sets
• Predictive Performance
• Performance of Optimizer

Context-aware Data Sets
AIST Food Data Movie Data
# of Ratings 6360 1010
# of Users 212 69
# of Items 20 176
# of Contexts
Real hunger
(full/normal/hungry)
Virtual hunger
Time (weekend, weekday)
Location (home, cinema)
Companions (friends, alone, etc)
Other
Features
User gender
Food genre, Food style
Food stuff
User gender
Year of the movie
Density Dense Sparse
Context-aware data sets are usually difficult to get….
Those two data sets were collected from surveys.

Evaluation Protocols
Metric: root-mean-square error (RMSE) and coverage which
denotes the percentage we can find neighbors for a prediction.
Our goal: improve RMSE (i.e. less errors) within a decent
coverage. We allow a decline in coverage, because applying
contextual constraints usually bring low coverage (i.e. the sparsity
of contexts!).
Baselines:
 context-free CF, i.e. the original UBCF
 contextual pre-filtering CF which just apply the contextual
constraints to the neighbor selection component – no other
components in DCR and DCW.
Other settings in DCR & DCW:
 K = 10 for UserKNN evaluated on 5-folds cross-validation
 T = 100 as the maximal iteration limit in the PSO process
 Weights are ranged within [0, 1]
 We use the same similarity threshold for each component,
which was iterated from 0.0 to 1.0 with 0.1 increment in DCW

Predictive Performances
Blue bars are RMSE values, Red lines are coverage curves.
Findings:
1) DCW works better than DCR and two baselines;
2) Significance t-test shows DCW works significantly in movie data,
but DCR was not significant over two baselines; DCW can further
alleviate sparsity of contexts and compensate DCR;
3) DCW offers better coverage over baselines!

Performances of Optimizer
Running time is in seconds.
Using 3 particles is the best configuration for two data sets here!
Factors influencing the running performances:
 More particles, quicker convergence but probably more costs;
 # of contextual variables: more contexts, probably slower;
 Density of the data set: denser, more calculations in DCW;
Typically DCW costs more than DCR, because it uses all
contextual dimensions and the calculation for similarity of contexts
is time-consuming, especially for dense data, like the Food data.

Other Results (Optional)
1.The optimal threshold for similarity of contexts
For Food data set, it is 0.6;
For Movie data set, it is 0.1;
2.The optimal weighting vectors (e.g. Movie data)
Note: Darker  smaller weights; Lighter  Larger weights

It is gonna end…
• Conclusions
• Future Work

Conclusions
 We propose DCW which is a finer-grained improvement over DCR;
 It can further improve predictive accuracy within decent coverage;
PSO is demonstrated to be the efficient optimizer;
 We found underlying factors influencing running time of optimizer;
Stay Tuned
DCR and DCW are general frameworks (DCM, i.e. differential context
modeling as the name of this framework), and they can be applied to
any recommendation algorithms which can be decomposed into
multiple components.
We have successfully extend its applications to item-based
collaborative filtering and slope one recommender.
References
Y. Zheng, R. Burke, B. Mobasher. "Differential Context Modeling in
Collaborative Filtering ". In SOCRS-2013, Chicago, IL USA 2013

Acknowledgement
Student Travel Support from US NSF (UMAP Platinum Sponsor)
Future Work
 Try other similarity of contexts instead of the simple Jaccard one;
 Introduce semantics into the similarity of contexts to further alleviate
the sparsity of contexts, e.g., Rome is closer to Florence than Paris.
 Parallel PSO or put PSO on MapReduce to speed up optimizer;
See u later…
The 19th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining (KDD), Chicago, IL USA, Aug 11-14, 2013

Thank You!
Center for Web Intelligence, DePaul University, Chicago, IL USA

[UMAP2013] Recommendation with Differential Context Weighting

More Related Content

What's hot (20)

Similar to [UMAP2013] Recommendation with Differential Context Weighting (20)

More from YONG ZHENG (16)

Recently uploaded (20)

[UMAP2013] Recommendation with Differential Context Weighting