SlideShare a Scribd company logo
Toon De Pessemier, Kris Vanhecke, Luc Martens,
September, 2016
iMinds – Ghent University, Belgium
toon.depessemier@ugent.be
A Scalable, High-performance Algorithm
for Hybrid Job Recommendations
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
2
Introduction: Job recommendations
Not a classic recommender story
Not a classic solution
 Specific metadata characteristics
 Discipline, industry, career level, …
 Detailed user profile
 Experience, education (university degree), employment
 Limited availability in time (active_during_test)
 Various user-item interactions
 Click, bookmark, reply, delete
 Specific meaning of delete (click on “X”  load new item)
 Impressions
 Recommendations generated by XING’s recommender  Bias
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
3
Our goals
 XING’s evaluation measure
 Reflects typical XING use case
 Scalable
 Number of users and items
 Dataset = subset of XING users
 Incremental updates
 Continuous stream of new job items
 Updating models instead of recalculating
 Fast score calculation
 New job items  fast distribution to target users
 Limited computational resources
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
4
Findings
 Challenge = Prediction task
 ≠ Recommendation task
 No influence on user behavior
 Recommendations are not evaluated
by the user
 Important quality metrics are not evaluated
 Usefulness
Risk: Items already discovered by the user
Items that the user already interacted with, can be recommended
 Diversity
Risk: Too much of the same
 Serendipity
Risk: Items that are difficult to find but interesting, are unfairly evaluated as
“poor recommendations”
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
5
Findings
 The information value of impressions is
limited
 Recommendations of existing job
recommender
 Bias to Xing’s algorithm
 Less diverse
 Subset of recommendations
 No guarantee that the user has seen the item
 No cold start user  Better results if only the
interactions are used
 Penalty for items with a limited visibility
 Low visibility  low probability of interaction
 Low visibility  penalty  better results
 Item visibility estimated by number of interactions in training
set
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
6
Findings
 Influence of the user’s region
 Expected: interest for jobs located in the user’s
home region or in adjacent regions
 Observed: Many interactions for jobs located in
non-adjacent or far away regions
 E.g. Users of Lower Saxony  Jobs in Baden-
Württemberg
 Many cold-start users
 No interactions, no impressions (9.7%)
 CB recommendation based on explicit profile
 Risk: too general or to specific profile
 Risk: not updated by the user
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
7
Findings
 Traditional classification does not work
 Positive class: click, bookmark, reply
 Negative class: delete
 Recommendations: items most typical for the positive class
 Poor score
 Reasoning: meaning of delete action
 Click on X button in recommendation list
 New recommendation will be loaded and displayed
 Deletes not sampled from complete job offer but from
recommendations (bias: items more similar to the user’s interests
than random items)
 Not necessarily a disinterest of the user
 Intension to click: new recommendation
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
8
Content-Based Recommender
 Based on feature matching
 Explicit user profile
 Interactions  counter for each feature
 Interaction weight
 Updating counters
 Delete=0, click=1, bookmark=10, reply=10 (no significant effect of deletes)
 Positive counters (posf,u)  item has feature
 Negative counters (negf,u)  item does not have feature
 Score calculation
 α = 0.5 (positive counters are more important than negative counters)
 IDF = inverse document frequency: feature frequency across all jobs
 N = total number of items
 nf = number of items with feature f
 wf = weight per feature type (tag, discipline, industry, …)
 u = user
 i = item
score(u,i) =
1
𝑓𝜖 𝑖
𝑓∈𝑖
𝑤𝑓 𝑝𝑜𝑠 𝑓,𝑢 − 𝛼 𝑛𝑒𝑔 𝑓,𝑢 𝑙𝑜𝑔
𝑁
𝑛 𝑓
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
9
Content-based calculation
 Profile
 Offline calculation
 Incremental updates of counters
 IDF
 Slightly varying over time
 Periodic updates
 Target items
 Active items
 Minimum matching threshold (positive counters and item
have X features in common)
 Algorithm running in parallel for different users
 Fast calculation of the recommendations
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
10
Collaborative filtering: KNN
 Traditional KNN
 Distance based on interactions
 Our KNN solution
 Distanced based on interactions and metadata
 2 items are similar if users have interacted with both
 2 items are similar if they have metadata features in common
Feature distance: factor 𝑙𝑜𝑔
𝑁
𝑛 𝑓
 Fine-grained distance function
 Risk of ties is reduced
 Method:
 For each candidate item:
 Calculate distance to k-nearest items that the user has positively interacted with
 Select items with shortest distance
 𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑖 =
1
𝑘 𝑘
𝐷𝑖𝑠𝑡 𝑚𝑎𝑥−𝐷𝑖𝑠𝑡 𝑖,𝑘
𝐷𝑖𝑠𝑡 𝑚𝑎𝑥
 Based on Weka Framework
 BallTree implementation of NearestNeighbourSearch package
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
11
KNN calculation
 Item distances
 Offline calculation
 Slightly varying over time
 If partially computed distance > threshold
 stop calculation
 Score calculation
 Fast if distances are precomputed
 Algorithm running in parallel for different users
 Fast calculation of the recommendations
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
12
Results and fallback
 CB: 286,041.10
 KNN: 298,316.85
 Hybrid: 344,264.37
 Fallback cold start users:
 No interactions:
 KNN based on interactions is not possible (26.5% of users)
 No interactions  use impressions (16.8% of users)
 Solution without fallback to impressions (only based on profile):
292,909.26
 No interactions and no impressions (9.7% of the users):
 Hybrid  CB
 CB cannot generate recommendations:
 For 1485 users
 Recommend the 30 most popular items (most positive interactions)
 Without fallback to most popular recommender: 344,241.51
 Most popular recommender as the only solution: 73,298.13
A Scalable, High-performance Algorithm for Hybrid Job Recommendations
Toon De Pessemier, Kris Vanhecke, Luc Martens
13
Questions?

More Related Content

PDF
Temporal Learning and Sequence Modeling for a Job Recommender System
PDF
Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...
PPTX
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
PDF
Recsys2021_slides_sato
PDF
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
PDF
Counterfactual Learning for Recommendation
PDF
Artwork Personalization at Netflix
PDF
Facebook Talk at Netflix ML Platform meetup Sep 2019
Temporal Learning and Sequence Modeling for a Job Recommender System
Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
Recsys2021_slides_sato
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
Counterfactual Learning for Recommendation
Artwork Personalization at Netflix
Facebook Talk at Netflix ML Platform meetup Sep 2019

What's hot (20)

PDF
Collaborative Filtering 2: Item-based CF
PPT
Strategies for Practical Active Learning, Robert Munro
PPT
Item Based Collaborative Filtering Recommendation Algorithms
PPT
activelearning.ppt
PPTX
DoWhy Python library for causal inference: An End-to-End tool
PDF
Causality without headaches
PPTX
Collaborative filtering at scale
PDF
GTC 2021: Counterfactual Learning to Rank in E-commerce
PDF
Techniques for Context-Aware and Cold-Start Recommendations
PPTX
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
PDF
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
PDF
Replication of Recommender Systems Research
PDF
Starke2017 - Effective User Interface Designs to Increase Energy-efficient Be...
PDF
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
PPTX
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
PDF
Replicable Evaluation of Recommender Systems
PDF
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
PPTX
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
PDF
Déjà Vu: The Importance of Time and Causality in Recommender Systems
PDF
Active Learning in Collaborative Filtering Recommender Systems : a Survey
Collaborative Filtering 2: Item-based CF
Strategies for Practical Active Learning, Robert Munro
Item Based Collaborative Filtering Recommendation Algorithms
activelearning.ppt
DoWhy Python library for causal inference: An End-to-End tool
Causality without headaches
Collaborative filtering at scale
GTC 2021: Counterfactual Learning to Rank in E-commerce
Techniques for Context-Aware and Cold-Start Recommendations
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
Replication of Recommender Systems Research
Starke2017 - Effective User Interface Designs to Increase Energy-efficient Be...
Cold-Start Management with Cross-Domain Collaborative Filtering and Tags
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Replicable Evaluation of Recommender Systems
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Active Learning in Collaborative Filtering Recommender Systems : a Survey
Ad

Similar to A Scalable, High-performance Algorithm for Hybrid Job Recommendations (20)

PDF
The subtle art of recommendation
PPTX
Collaborative Filtering Recommendation System
PPTX
Recommendation Modeling with Impression Data at Netflix
PPTX
Towards Complex User Feedback and Presentation Context in Recommender Systems
PPTX
Machine Learning in e commerce - Reboot
PDF
DMA MAC Presentation: Kajal Mukhopadhyay, Ph.D.
PDF
Software Project Estimation
PDF
Artificial Intelligence at LinkedIn
PPTX
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
PPT
Online feedback correlation using clustering
PPS
Software Development in the Brave New world
PPS
3685807
PPTX
UX Research
PPTX
Usability Testing Basics: What's it All About? at Web SIG Cleveland
PDF
Prototyping and Usability Testing your designs
PPT
Lobsters, Wine and Market Research
PPTX
Measuring the User Experience in Digital Products
PPTX
PPT
User Zoom Webinar Monster Aug09 Vf
PDF
IxD meets DS
The subtle art of recommendation
Collaborative Filtering Recommendation System
Recommendation Modeling with Impression Data at Netflix
Towards Complex User Feedback and Presentation Context in Recommender Systems
Machine Learning in e commerce - Reboot
DMA MAC Presentation: Kajal Mukhopadhyay, Ph.D.
Software Project Estimation
Artificial Intelligence at LinkedIn
Introduction to MaxDiff Scaling of Importance - Parametric Marketing Slides
Online feedback correlation using clustering
Software Development in the Brave New world
3685807
UX Research
Usability Testing Basics: What's it All About? at Web SIG Cleveland
Prototyping and Usability Testing your designs
Lobsters, Wine and Market Research
Measuring the User Experience in Digital Products
User Zoom Webinar Monster Aug09 Vf
IxD meets DS
Ad

Recently uploaded (20)

PDF
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PDF
Introduction to the IoT system, how the IoT system works
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PDF
The Internet -By the Numbers, Sri Lanka Edition
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPT
tcp ip networks nd ip layering assotred slides
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
Funds Management Learning Material for Beg
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
Design_with_Watersergyerge45hrbgre4top (1).ppt
Slides PPTX World Game (s) Eco Economic Epochs.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
An introduction to the IFRS (ISSB) Stndards.pdf
Cloud-Scale Log Monitoring _ Datadog.pdf
Tenda Login Guide: Access Your Router in 5 Easy Steps
Introduction to the IoT system, how the IoT system works
The New Creative Director: How AI Tools for Social Media Content Creation Are...
The Internet -By the Numbers, Sri Lanka Edition
RPKI Status Update, presented by Makito Lay at IDNOG 10
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
Paper PDF World Game (s) Great Redesign.pdf
tcp ip networks nd ip layering assotred slides
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Funds Management Learning Material for Beg
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
522797556-Unit-2-Temperature-measurement-1-1.pptx

A Scalable, High-performance Algorithm for Hybrid Job Recommendations

  • 1. Toon De Pessemier, Kris Vanhecke, Luc Martens, September, 2016 iMinds – Ghent University, Belgium toon.depessemier@ugent.be A Scalable, High-performance Algorithm for Hybrid Job Recommendations
  • 2. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 2 Introduction: Job recommendations Not a classic recommender story Not a classic solution  Specific metadata characteristics  Discipline, industry, career level, …  Detailed user profile  Experience, education (university degree), employment  Limited availability in time (active_during_test)  Various user-item interactions  Click, bookmark, reply, delete  Specific meaning of delete (click on “X”  load new item)  Impressions  Recommendations generated by XING’s recommender  Bias
  • 3. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 3 Our goals  XING’s evaluation measure  Reflects typical XING use case  Scalable  Number of users and items  Dataset = subset of XING users  Incremental updates  Continuous stream of new job items  Updating models instead of recalculating  Fast score calculation  New job items  fast distribution to target users  Limited computational resources
  • 4. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 4 Findings  Challenge = Prediction task  ≠ Recommendation task  No influence on user behavior  Recommendations are not evaluated by the user  Important quality metrics are not evaluated  Usefulness Risk: Items already discovered by the user Items that the user already interacted with, can be recommended  Diversity Risk: Too much of the same  Serendipity Risk: Items that are difficult to find but interesting, are unfairly evaluated as “poor recommendations”
  • 5. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 5 Findings  The information value of impressions is limited  Recommendations of existing job recommender  Bias to Xing’s algorithm  Less diverse  Subset of recommendations  No guarantee that the user has seen the item  No cold start user  Better results if only the interactions are used  Penalty for items with a limited visibility  Low visibility  low probability of interaction  Low visibility  penalty  better results  Item visibility estimated by number of interactions in training set
  • 6. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 6 Findings  Influence of the user’s region  Expected: interest for jobs located in the user’s home region or in adjacent regions  Observed: Many interactions for jobs located in non-adjacent or far away regions  E.g. Users of Lower Saxony  Jobs in Baden- Württemberg  Many cold-start users  No interactions, no impressions (9.7%)  CB recommendation based on explicit profile  Risk: too general or to specific profile  Risk: not updated by the user
  • 7. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 7 Findings  Traditional classification does not work  Positive class: click, bookmark, reply  Negative class: delete  Recommendations: items most typical for the positive class  Poor score  Reasoning: meaning of delete action  Click on X button in recommendation list  New recommendation will be loaded and displayed  Deletes not sampled from complete job offer but from recommendations (bias: items more similar to the user’s interests than random items)  Not necessarily a disinterest of the user  Intension to click: new recommendation
  • 8. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 8 Content-Based Recommender  Based on feature matching  Explicit user profile  Interactions  counter for each feature  Interaction weight  Updating counters  Delete=0, click=1, bookmark=10, reply=10 (no significant effect of deletes)  Positive counters (posf,u)  item has feature  Negative counters (negf,u)  item does not have feature  Score calculation  α = 0.5 (positive counters are more important than negative counters)  IDF = inverse document frequency: feature frequency across all jobs  N = total number of items  nf = number of items with feature f  wf = weight per feature type (tag, discipline, industry, …)  u = user  i = item score(u,i) = 1 𝑓𝜖 𝑖 𝑓∈𝑖 𝑤𝑓 𝑝𝑜𝑠 𝑓,𝑢 − 𝛼 𝑛𝑒𝑔 𝑓,𝑢 𝑙𝑜𝑔 𝑁 𝑛 𝑓
  • 9. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 9 Content-based calculation  Profile  Offline calculation  Incremental updates of counters  IDF  Slightly varying over time  Periodic updates  Target items  Active items  Minimum matching threshold (positive counters and item have X features in common)  Algorithm running in parallel for different users  Fast calculation of the recommendations
  • 10. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 10 Collaborative filtering: KNN  Traditional KNN  Distance based on interactions  Our KNN solution  Distanced based on interactions and metadata  2 items are similar if users have interacted with both  2 items are similar if they have metadata features in common Feature distance: factor 𝑙𝑜𝑔 𝑁 𝑛 𝑓  Fine-grained distance function  Risk of ties is reduced  Method:  For each candidate item:  Calculate distance to k-nearest items that the user has positively interacted with  Select items with shortest distance  𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑖 = 1 𝑘 𝑘 𝐷𝑖𝑠𝑡 𝑚𝑎𝑥−𝐷𝑖𝑠𝑡 𝑖,𝑘 𝐷𝑖𝑠𝑡 𝑚𝑎𝑥  Based on Weka Framework  BallTree implementation of NearestNeighbourSearch package
  • 11. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 11 KNN calculation  Item distances  Offline calculation  Slightly varying over time  If partially computed distance > threshold  stop calculation  Score calculation  Fast if distances are precomputed  Algorithm running in parallel for different users  Fast calculation of the recommendations
  • 12. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 12 Results and fallback  CB: 286,041.10  KNN: 298,316.85  Hybrid: 344,264.37  Fallback cold start users:  No interactions:  KNN based on interactions is not possible (26.5% of users)  No interactions  use impressions (16.8% of users)  Solution without fallback to impressions (only based on profile): 292,909.26  No interactions and no impressions (9.7% of the users):  Hybrid  CB  CB cannot generate recommendations:  For 1485 users  Recommend the 30 most popular items (most positive interactions)  Without fallback to most popular recommender: 344,241.51  Most popular recommender as the only solution: 73,298.13
  • 13. A Scalable, High-performance Algorithm for Hybrid Job Recommendations Toon De Pessemier, Kris Vanhecke, Luc Martens 13 Questions?