SlideShare a Scribd company logo
Recommendation with
Differential Context Weighting
Recommendation with
Differential Context Weighting
Yong Zheng
Robin Burke
Bamshad Mobasher
Center for Web Intelligence
DePaul University
Chicago, IL USA
Yong Zheng
Robin Burke
Bamshad Mobasher
Center for Web Intelligence
DePaul University
Chicago, IL USA
Conference on UMAP
June 12, 2013
Overview
• Introduction (RS and Context-aware RS)
• Sparsity of Contexts and Relevant Solutions
• Differential Context Relaxation & Weighting
• Experimental Results
• Conclusion and Future Work
Introduction
• Recommender Systems
• Context-aware Recommender Systems
Recommender Systems (RS)
• Information Overload  Recommendations
Context-aware RS (CARS)
• Traditional RS: Users × Items  Ratings
• Context-aware RS: Users × Items × Contexts Ratings
Companion
Example of Contexts in different domains:
 Food: time (lunch, dinner), occasion (business lunch, family dinner)
 Movie: time (weekend, weekday), location (home, cinema), etc
 Music: time (morning, evening), activity (study, sports, party), etc
 Book: a book as a gift for kids or mother, etc
Recommendation cannot live alone without considering contexts.
Research Problems
• Sparsity of Contexts
• Relevant Solutions
Sparsity of Contexts
• Assumption of Context-aware RS: It is better to use
preferences in the same contexts for predictions in
recommender systems.
• Same contexts? How about multiple contexts & sparsity?
An example in the movie domain:
Are there rating profiles in the contexts <Weekday, Home, Sister>?
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
Relevant Solutions
Context Matching  The same contexts <Weekday, Home, Sister>?
1.Context Selection  Use the influential dimensions only
2.Context Relaxation  Use a relaxed set of dimensions, e.g. time
3.Context Weighting  We can use all dimensions, but measure how
similar the contexts are! (to be continued later)
Differences between context selection and context relaxation:
 Context selection is conducted by surveys or statistics;
 Context relaxation is directly towards optimization on predictions;
 Optimal context relaxation/weighting is a learning process!
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
DCR and DCW
• Differential Context Relaxation (DCR)
• Differential Context Weighting (DCW)
• Particle Swarm Intelligence as Optimizer
Differential Context Relaxation
Differential Context Relaxation (DCR) is our first attempt to alleviate
the sparsity of contexts, and differential context weighting (DCW) is a
finer-grained improvement over DCR.
• There are two notion in DCR
 “Differential” Part  Algorithm Decomposition
 Separate one algorithm into different functional components;
 Apply appropriate context constraints to each component;
 Maximize the global contextual effects together;
 “Relaxation” Part  Context Relaxation
We use a set of relaxed dimensions instead of all of them.
• References
 Y. Zheng, R. Burke, B. Mobasher. "Differential Context Relaxation for Context-aware
Travel Recommendation". In EC-WEB, 2012
 Y. Zheng, R. Burke, B. Mobasher. "Optimal Feature Selection for Context-Aware
Recommendation using Differential Relaxation". In RecSys Workshop on CARS, 2012
DCR – Algorithm Decomposition
Take User-based Collaborative Filtering (UBCF) for example.
Pirates of the
Caribbean 4
Kung Fu Panda
2
Harry Potter
6
Harry Potter
7
U1 4 4 2 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
Standard Process in UBCF (Top-K UserKNN, K=1 for example):
1). Find neighbors based on user-user similarity
2). Aggregate neighbors’ contribution
3). Make final predictions
DCR – Algorithm Decomposition
Take User-based Collaborative Filtering (UBCF) for example.
1.Neighbor Selection 2.Neighbor contribution
3.User baseline 4.User Similarity
All components contribute to the final predictions, where
we assume appropriate contextual constraints can leverage the
contextual effect in each algorithm component.
e.g. use neighbors who rated in same contexts.
DCR – Context Relaxation
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
Notion of Context Relaxation:
• Use {Time, Location, Companion}  0 record matched!
• Use {Time, Location}  1 record matched!
• Use {Time}  2 records matched!
In DCR, we choose appropriate context relaxation for each component.
# of matched ratings best performances & least noises
Balance
DCR – Context Relaxation
3.User baseline 4.User Similarity
2.Neighbor contribution
1.Neighbor Selection
c is the original contexts, e.g. <Weekday, Home, Sister>
C1, C2, C3, C4 are the relaxed contexts.
The selection is modeled by a binary vector.
E.g. <1, 0, 0> denotes we just selected the first context dimension
Take neighbor selection for example:
Originally select neighbors by users who rated the same item.
DCR further filter those neighbors by contextual constraint C1
i.e.. C1 = <1,0,0>  Time=Weekday u must rated i on weekdays
DCR – Drawbacks
3.User baseline 4.User Similarity
2.Neighbor contribution
1.Neighbor Selection
1. Context relaxation is still strict, especially when data is sparse.
2. Components are dependent. For example, neighbor contribution is
dependent with neighbor selection. E.g. neighbors are selected by
C1: Location = Cinema, it is not guaranteed, neighbor has ratings
under contexts C2: Time = Weekend
A finer-grained solution is required!!  Differential Context Weighting
Differential Context Weighting
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
 Goal: Use all dimensions, but we measure the similarity of contexts.
 Assumption: More similar two contexts are given, the ratings may be
more useful for calculations in predictions.
c and d are two contexts. (Two red regions in the Table above.)
σ is the weighting vector <w1, w2, w3> for three dimensions.
Assume they are equal weights, w1 = w2 = w3 = 1.
J(c, d, σ) = # of matched dimensions / # of all dimensions = 2/3
Similarity of contexts is measured by
Weighted Jaccard similarity
Differential Context Weighting
3.User baseline
4.User Similarity
2.Neighbor contribution1.Neighbor Selection
1.“Differential” part  Components are all the same as in DCR.
2.“Context Weighting” part (for each individual component):
 σ is the weighting vector
 ϵ is a threshold for the similarity of contexts.
i.e., only records with similar enough (≥ ϵ) contexts can be included.
3.In calculations, similarity of contexts are the weights, for example
2.Neighbor
contribution
It is similar calculation for the other components.
Particle Swarm Optimization (PSO)
The remaining work is to find optimal context relaxation vectors for
DCR and context weighting vectors for DCW. PSO is derived from
swarm intelligence which helps achieve a goal by collaborative
Fish Birds Bees
Why PSO?
1). Easy to implement as a non-linear optimizer;
2). Has been used in weighted CF before, and was demonstrated
to work better than other non-linear optimizer, e.g. genetic algorithm;
3). Our previous work successfully applied BPSO for DCR;
Particle Swarm Optimization (PSO)
Swarm = a group of birds
Particle = each bird ≈ each run in algorithm
Vector = bird’s position in the space ≈ Vectors we need
Goal = the location of pizza ≈ Lower prediction error
So, how to find goal by swam?
1.Looking for the pizza
Assume a machine can tell the distance
2.Each iteration is an attempt or move
3.Cognitive learning from particle itself
Am I closer to the pizza comparing with
my “best ”locations in previous history?
4.Social Learning from the swarm
Hey, my distance is 1 mile. It is the closest!
. Follow me!! Then other birds move towards here.
DCR – Feature selection – Modeled by binary vectors – Binary PSO
DCW – Feature weighting – Modeled by real-number vectors – PSO
How it works? Take DCR and Binary PSO for example:
Assume there are 4 components and 3 contextual dimensions
Thus there are 4 binary vectors for each component respectively
We merge the vectors into a single one, the vector size is 3*4 = 12
This single vector is the particle’s position vector in PSO process.
Experimental Results
• Data Sets
• Predictive Performance
• Performance of Optimizer
Context-aware Data Sets
AIST Food Data Movie Data
# of Ratings 6360 1010
# of Users 212 69
# of Items 20 176
# of Contexts
Real hunger
(full/normal/hungry)
Virtual hunger
Time (weekend, weekday)
Location (home, cinema)
Companions (friends, alone, etc)
Other
Features
User gender
Food genre, Food style
Food stuff
User gender
Year of the movie
Density Dense Sparse
Context-aware data sets are usually difficult to get….
Those two data sets were collected from surveys.
Evaluation Protocols
Metric: root-mean-square error (RMSE) and coverage which
denotes the percentage we can find neighbors for a prediction.
Our goal: improve RMSE (i.e. less errors) within a decent
coverage. We allow a decline in coverage, because applying
contextual constraints usually bring low coverage (i.e. the sparsity
of contexts!).
Baselines:
 context-free CF, i.e. the original UBCF
 contextual pre-filtering CF which just apply the contextual
constraints to the neighbor selection component – no other
components in DCR and DCW.
Other settings in DCR & DCW:
 K = 10 for UserKNN evaluated on 5-folds cross-validation
 T = 100 as the maximal iteration limit in the PSO process
 Weights are ranged within [0, 1]
 We use the same similarity threshold for each component,
which was iterated from 0.0 to 1.0 with 0.1 increment in DCW
Predictive Performances
Blue bars are RMSE values, Red lines are coverage curves.
Findings:
1) DCW works better than DCR and two baselines;
2) Significance t-test shows DCW works significantly in movie data,
but DCR was not significant over two baselines; DCW can further
alleviate sparsity of contexts and compensate DCR;
3) DCW offers better coverage over baselines!
Performances of Optimizer
Running time is in seconds.
Using 3 particles is the best configuration for two data sets here!
Factors influencing the running performances:
 More particles, quicker convergence but probably more costs;
 # of contextual variables: more contexts, probably slower;
 Density of the data set: denser, more calculations in DCW;
Typically DCW costs more than DCR, because it uses all
contextual dimensions and the calculation for similarity of contexts
is time-consuming, especially for dense data, like the Food data.
Other Results (Optional)
1.The optimal threshold for similarity of contexts
For Food data set, it is 0.6;
For Movie data set, it is 0.1;
2.The optimal weighting vectors (e.g. Movie data)
Note: Darker  smaller weights; Lighter  Larger weights
It is gonna end…
• Conclusions
• Future Work
Conclusions
 We propose DCW which is a finer-grained improvement over DCR;
 It can further improve predictive accuracy within decent coverage;
PSO is demonstrated to be the efficient optimizer;
 We found underlying factors influencing running time of optimizer;
Stay Tuned
DCR and DCW are general frameworks (DCM, i.e. differential context
modeling as the name of this framework), and they can be applied to
any recommendation algorithms which can be decomposed into
multiple components.
We have successfully extend its applications to item-based
collaborative filtering and slope one recommender.
References
Y. Zheng, R. Burke, B. Mobasher. "Differential Context Modeling in
Collaborative Filtering ". In SOCRS-2013, Chicago, IL USA 2013
Acknowledgement
Student Travel Support from US NSF (UMAP Platinum Sponsor)
Future Work
 Try other similarity of contexts instead of the simple Jaccard one;
 Introduce semantics into the similarity of contexts to further alleviate
the sparsity of contexts, e.g., Rome is closer to Florence than Paris.
 Parallel PSO or put PSO on MapReduce to speed up optimizer;
See u later…
The 19th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining (KDD), Chicago, IL USA, Aug 11-14, 2013
Thank You!
Center for Web Intelligence, DePaul University, Chicago, IL USA

More Related Content

PDF
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
PDF
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
PDF
[WI 2014]Context Recommendation Using Multi-label Classification
PDF
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
PDF
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
PDF
[ECWEB2012]Differential Context Relaxation for Context-Aware Travel Recommend...
PDF
[WISE 2015] Similarity-Based Context-aware Recommendation
PDF
Context-aware Recommendation: A Quick View
[CARS2012@RecSys]Optimal Feature Selection for Context-Aware Recommendation u...
[SOCRS2013]Differential Context Modeling in Collaborative Filtering
[WI 2014]Context Recommendation Using Multi-label Classification
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[SAC2014]Splitting Approaches for Context-Aware Recommendation: An Empirical ...
[ECWEB2012]Differential Context Relaxation for Context-Aware Travel Recommend...
[WISE 2015] Similarity-Based Context-aware Recommendation
Context-aware Recommendation: A Quick View

What's hot (20)

PDF
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
PDF
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
PDF
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
PDF
Time-dependand Recommendation based on Implicit Feedback
PDF
Matrix Factorization Techniques For Recommender Systems
PDF
Final Report_Project2
PDF
Recommender Systems from A to Z – Model Evaluation
PDF
Matrix Factorization Technique for Recommender Systems
PDF
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
PPTX
Collaborative Filtering at Spotify
PDF
Maximizing the Representation Gap between In-domain & OOD examples
PPT
Recommendation and Information Retrieval: Two Sides of the Same Coin?
PPTX
Deep Learning for Search
PDF
[Paper] DetectoRS for Object Detection
PDF
A Multiscale Visualization of Attention in the Transformer Model
PDF
Latent factor models for Collaborative Filtering
PPT
Item Based Collaborative Filtering Recommendation Algorithms
PDF
Deep reinforcement learning from scratch
PDF
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
PPT
Cs583 recommender-systems
[SAC 2015] Improve General Contextual SLIM Recommendation Algorithms By Facto...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
Time-dependand Recommendation based on Implicit Feedback
Matrix Factorization Techniques For Recommender Systems
Final Report_Project2
Recommender Systems from A to Z – Model Evaluation
Matrix Factorization Technique for Recommender Systems
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
Collaborative Filtering at Spotify
Maximizing the Representation Gap between In-domain & OOD examples
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Deep Learning for Search
[Paper] DetectoRS for Object Detection
A Multiscale Visualization of Attention in the Transformer Model
Latent factor models for Collaborative Filtering
Item Based Collaborative Filtering Recommendation Algorithms
Deep reinforcement learning from scratch
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Cs583 recommender-systems
Ad

Similar to [UMAP2013] Recommendation with Differential Context Weighting (20)

PDF
09_dm1_knn_2022_23.pdf
PPTX
Movie lens movie recommendation system
PDF
SPATIAL POINT PATTERNS
PDF
PDF
A cognitive psychologist's approach to data mining
PDF
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
PPTX
Optimization and particle swarm optimization (O & PSO)
PDF
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
PPTX
TEXT FEUTURE SELECTION USING PARTICLE SWARM OPTIMIZATION (PSO)
PDF
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
PPTX
A new similarity measurement based on hellinger distance for collaborating fi...
PDF
[UMAP 2016] User-Oriented Context Suggestion
PPTX
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
PPT
Download
PPT
Download
PDF
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
PDF
DMTM Lecture 11 Clustering
PDF
Words in Space - Rebecca Bilbro
PDF
A Visual Exploration of Distance, Documents, and Distributions
PDF
Icdm2013 slides
09_dm1_knn_2022_23.pdf
Movie lens movie recommendation system
SPATIAL POINT PATTERNS
A cognitive psychologist's approach to data mining
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
Optimization and particle swarm optimization (O & PSO)
Machine learning by using python Lesson One Part 2 By Professor Lili Saghafi
TEXT FEUTURE SELECTION USING PARTICLE SWARM OPTIMIZATION (PSO)
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
A new similarity measurement based on hellinger distance for collaborating fi...
[UMAP 2016] User-Oriented Context Suggestion
W5_CLASSIFICATION.pptxW5_CLASSIFICATION.pptx
Download
Download
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
DMTM Lecture 11 Clustering
Words in Space - Rebecca Bilbro
A Visual Exploration of Distance, Documents, and Distributions
Icdm2013 slides
Ad

More from YONG ZHENG (16)

PDF
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
PDF
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
PDF
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
PDF
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
PDF
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
PDF
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
PDF
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
PDF
Tutorial: Context In Recommender Systems
PDF
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
PDF
Matrix Factorization In Recommender Systems
PDF
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
PDF
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
PDF
Slope one recommender on hadoop
PDF
A manual for Ph.D dissertation
PDF
Attention flow by tagging prediction
PDF
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
Tutorial: Context In Recommender Systems
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
Matrix Factorization In Recommender Systems
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[UMAP2013]Tutorial on Context-Aware User Modeling for Recommendation by Bamsh...
Slope one recommender on hadoop
A manual for Ph.D dissertation
Attention flow by tagging prediction
[HetRec2011@RecSys]Experience Discovery: Hybrid Recommendation of Student Act...

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
August Patch Tuesday
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Getting Started with Data Integration: FME Form 101
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Hybrid model detection and classification of lung cancer
Enhancing emotion recognition model for a student engagement use case through...
Programs and apps: productivity, graphics, security and other tools
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Web App vs Mobile App What Should You Build First.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
Mushroom cultivation and it's methods.pdf
Group 1 Presentation -Planning and Decision Making .pptx
A novel scalable deep ensemble learning framework for big data classification...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
August Patch Tuesday
Assigned Numbers - 2025 - Bluetooth® Document
Getting Started with Data Integration: FME Form 101
A comparative study of natural language inference in Swahili using monolingua...
A comparative analysis of optical character recognition models for extracting...
Building Integrated photovoltaic BIPV_UPV.pdf
Approach and Philosophy of On baking technology
Hybrid model detection and classification of lung cancer

[UMAP2013] Recommendation with Differential Context Weighting

  • 1. Recommendation with Differential Context Weighting Recommendation with Differential Context Weighting Yong Zheng Robin Burke Bamshad Mobasher Center for Web Intelligence DePaul University Chicago, IL USA Yong Zheng Robin Burke Bamshad Mobasher Center for Web Intelligence DePaul University Chicago, IL USA Conference on UMAP June 12, 2013
  • 2. Overview • Introduction (RS and Context-aware RS) • Sparsity of Contexts and Relevant Solutions • Differential Context Relaxation & Weighting • Experimental Results • Conclusion and Future Work
  • 3. Introduction • Recommender Systems • Context-aware Recommender Systems
  • 4. Recommender Systems (RS) • Information Overload  Recommendations
  • 5. Context-aware RS (CARS) • Traditional RS: Users × Items  Ratings • Context-aware RS: Users × Items × Contexts Ratings Companion Example of Contexts in different domains:  Food: time (lunch, dinner), occasion (business lunch, family dinner)  Movie: time (weekend, weekday), location (home, cinema), etc  Music: time (morning, evening), activity (study, sports, party), etc  Book: a book as a gift for kids or mother, etc Recommendation cannot live alone without considering contexts.
  • 6. Research Problems • Sparsity of Contexts • Relevant Solutions
  • 7. Sparsity of Contexts • Assumption of Context-aware RS: It is better to use preferences in the same contexts for predictions in recommender systems. • Same contexts? How about multiple contexts & sparsity? An example in the movie domain: Are there rating profiles in the contexts <Weekday, Home, Sister>? User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister ?
  • 8. Relevant Solutions Context Matching  The same contexts <Weekday, Home, Sister>? 1.Context Selection  Use the influential dimensions only 2.Context Relaxation  Use a relaxed set of dimensions, e.g. time 3.Context Weighting  We can use all dimensions, but measure how similar the contexts are! (to be continued later) Differences between context selection and context relaxation:  Context selection is conducted by surveys or statistics;  Context relaxation is directly towards optimization on predictions;  Optimal context relaxation/weighting is a learning process! User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister ?
  • 9. DCR and DCW • Differential Context Relaxation (DCR) • Differential Context Weighting (DCW) • Particle Swarm Intelligence as Optimizer
  • 10. Differential Context Relaxation Differential Context Relaxation (DCR) is our first attempt to alleviate the sparsity of contexts, and differential context weighting (DCW) is a finer-grained improvement over DCR. • There are two notion in DCR  “Differential” Part  Algorithm Decomposition  Separate one algorithm into different functional components;  Apply appropriate context constraints to each component;  Maximize the global contextual effects together;  “Relaxation” Part  Context Relaxation We use a set of relaxed dimensions instead of all of them. • References  Y. Zheng, R. Burke, B. Mobasher. "Differential Context Relaxation for Context-aware Travel Recommendation". In EC-WEB, 2012  Y. Zheng, R. Burke, B. Mobasher. "Optimal Feature Selection for Context-Aware Recommendation using Differential Relaxation". In RecSys Workshop on CARS, 2012
  • 11. DCR – Algorithm Decomposition Take User-based Collaborative Filtering (UBCF) for example. Pirates of the Caribbean 4 Kung Fu Panda 2 Harry Potter 6 Harry Potter 7 U1 4 4 2 2 U2 3 4 2 1 U3 2 2 4 4 U4 4 4 1 ? Standard Process in UBCF (Top-K UserKNN, K=1 for example): 1). Find neighbors based on user-user similarity 2). Aggregate neighbors’ contribution 3). Make final predictions
  • 12. DCR – Algorithm Decomposition Take User-based Collaborative Filtering (UBCF) for example. 1.Neighbor Selection 2.Neighbor contribution 3.User baseline 4.User Similarity All components contribute to the final predictions, where we assume appropriate contextual constraints can leverage the contextual effect in each algorithm component. e.g. use neighbors who rated in same contexts.
  • 13. DCR – Context Relaxation User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister ? Notion of Context Relaxation: • Use {Time, Location, Companion}  0 record matched! • Use {Time, Location}  1 record matched! • Use {Time}  2 records matched! In DCR, we choose appropriate context relaxation for each component. # of matched ratings best performances & least noises Balance
  • 14. DCR – Context Relaxation 3.User baseline 4.User Similarity 2.Neighbor contribution 1.Neighbor Selection c is the original contexts, e.g. <Weekday, Home, Sister> C1, C2, C3, C4 are the relaxed contexts. The selection is modeled by a binary vector. E.g. <1, 0, 0> denotes we just selected the first context dimension Take neighbor selection for example: Originally select neighbors by users who rated the same item. DCR further filter those neighbors by contextual constraint C1 i.e.. C1 = <1,0,0>  Time=Weekday u must rated i on weekdays
  • 15. DCR – Drawbacks 3.User baseline 4.User Similarity 2.Neighbor contribution 1.Neighbor Selection 1. Context relaxation is still strict, especially when data is sparse. 2. Components are dependent. For example, neighbor contribution is dependent with neighbor selection. E.g. neighbors are selected by C1: Location = Cinema, it is not guaranteed, neighbor has ratings under contexts C2: Time = Weekend A finer-grained solution is required!!  Differential Context Weighting
  • 16. Differential Context Weighting User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister ?  Goal: Use all dimensions, but we measure the similarity of contexts.  Assumption: More similar two contexts are given, the ratings may be more useful for calculations in predictions. c and d are two contexts. (Two red regions in the Table above.) σ is the weighting vector <w1, w2, w3> for three dimensions. Assume they are equal weights, w1 = w2 = w3 = 1. J(c, d, σ) = # of matched dimensions / # of all dimensions = 2/3 Similarity of contexts is measured by Weighted Jaccard similarity
  • 17. Differential Context Weighting 3.User baseline 4.User Similarity 2.Neighbor contribution1.Neighbor Selection 1.“Differential” part  Components are all the same as in DCR. 2.“Context Weighting” part (for each individual component):  σ is the weighting vector  ϵ is a threshold for the similarity of contexts. i.e., only records with similar enough (≥ ϵ) contexts can be included. 3.In calculations, similarity of contexts are the weights, for example 2.Neighbor contribution It is similar calculation for the other components.
  • 18. Particle Swarm Optimization (PSO) The remaining work is to find optimal context relaxation vectors for DCR and context weighting vectors for DCW. PSO is derived from swarm intelligence which helps achieve a goal by collaborative Fish Birds Bees Why PSO? 1). Easy to implement as a non-linear optimizer; 2). Has been used in weighted CF before, and was demonstrated to work better than other non-linear optimizer, e.g. genetic algorithm; 3). Our previous work successfully applied BPSO for DCR;
  • 19. Particle Swarm Optimization (PSO) Swarm = a group of birds Particle = each bird ≈ each run in algorithm Vector = bird’s position in the space ≈ Vectors we need Goal = the location of pizza ≈ Lower prediction error So, how to find goal by swam? 1.Looking for the pizza Assume a machine can tell the distance 2.Each iteration is an attempt or move 3.Cognitive learning from particle itself Am I closer to the pizza comparing with my “best ”locations in previous history? 4.Social Learning from the swarm Hey, my distance is 1 mile. It is the closest! . Follow me!! Then other birds move towards here. DCR – Feature selection – Modeled by binary vectors – Binary PSO DCW – Feature weighting – Modeled by real-number vectors – PSO How it works? Take DCR and Binary PSO for example: Assume there are 4 components and 3 contextual dimensions Thus there are 4 binary vectors for each component respectively We merge the vectors into a single one, the vector size is 3*4 = 12 This single vector is the particle’s position vector in PSO process.
  • 20. Experimental Results • Data Sets • Predictive Performance • Performance of Optimizer
  • 21. Context-aware Data Sets AIST Food Data Movie Data # of Ratings 6360 1010 # of Users 212 69 # of Items 20 176 # of Contexts Real hunger (full/normal/hungry) Virtual hunger Time (weekend, weekday) Location (home, cinema) Companions (friends, alone, etc) Other Features User gender Food genre, Food style Food stuff User gender Year of the movie Density Dense Sparse Context-aware data sets are usually difficult to get…. Those two data sets were collected from surveys.
  • 22. Evaluation Protocols Metric: root-mean-square error (RMSE) and coverage which denotes the percentage we can find neighbors for a prediction. Our goal: improve RMSE (i.e. less errors) within a decent coverage. We allow a decline in coverage, because applying contextual constraints usually bring low coverage (i.e. the sparsity of contexts!). Baselines:  context-free CF, i.e. the original UBCF  contextual pre-filtering CF which just apply the contextual constraints to the neighbor selection component – no other components in DCR and DCW. Other settings in DCR & DCW:  K = 10 for UserKNN evaluated on 5-folds cross-validation  T = 100 as the maximal iteration limit in the PSO process  Weights are ranged within [0, 1]  We use the same similarity threshold for each component, which was iterated from 0.0 to 1.0 with 0.1 increment in DCW
  • 23. Predictive Performances Blue bars are RMSE values, Red lines are coverage curves. Findings: 1) DCW works better than DCR and two baselines; 2) Significance t-test shows DCW works significantly in movie data, but DCR was not significant over two baselines; DCW can further alleviate sparsity of contexts and compensate DCR; 3) DCW offers better coverage over baselines!
  • 24. Performances of Optimizer Running time is in seconds. Using 3 particles is the best configuration for two data sets here! Factors influencing the running performances:  More particles, quicker convergence but probably more costs;  # of contextual variables: more contexts, probably slower;  Density of the data set: denser, more calculations in DCW; Typically DCW costs more than DCR, because it uses all contextual dimensions and the calculation for similarity of contexts is time-consuming, especially for dense data, like the Food data.
  • 25. Other Results (Optional) 1.The optimal threshold for similarity of contexts For Food data set, it is 0.6; For Movie data set, it is 0.1; 2.The optimal weighting vectors (e.g. Movie data) Note: Darker  smaller weights; Lighter  Larger weights
  • 26. It is gonna end… • Conclusions • Future Work
  • 27. Conclusions  We propose DCW which is a finer-grained improvement over DCR;  It can further improve predictive accuracy within decent coverage; PSO is demonstrated to be the efficient optimizer;  We found underlying factors influencing running time of optimizer; Stay Tuned DCR and DCW are general frameworks (DCM, i.e. differential context modeling as the name of this framework), and they can be applied to any recommendation algorithms which can be decomposed into multiple components. We have successfully extend its applications to item-based collaborative filtering and slope one recommender. References Y. Zheng, R. Burke, B. Mobasher. "Differential Context Modeling in Collaborative Filtering ". In SOCRS-2013, Chicago, IL USA 2013
  • 28. Acknowledgement Student Travel Support from US NSF (UMAP Platinum Sponsor) Future Work  Try other similarity of contexts instead of the simple Jaccard one;  Introduce semantics into the similarity of contexts to further alleviate the sparsity of contexts, e.g., Rome is closer to Florence than Paris.  Parallel PSO or put PSO on MapReduce to speed up optimizer; See u later… The 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL USA, Aug 11-14, 2013
  • 29. Thank You! Center for Web Intelligence, DePaul University, Chicago, IL USA