Building Recommender Systems - Mendeley and Science Direct

| 0
Daniel Kershaw (@danjamker)
Building Recommenders
20th September 2017

| 1
Mendeley
• Reference Manager
• Social Network
• Publication Catalogue

| 2
Science Direct
• Scientific publication database
• Used by the majority of
university and research
institutions
• Contains 12 million articles of
content from 3,500 academic
journals and 34,000 e-books

| 3
Why Recommendations
Pull
Allow users to discover more content
Make it easier to navigate catalogue

| 4
Why Recommendations
Pull
Allow users to discover more content
Make it easier to navigate catalogue
Push
Highlight new content to users
Bring users back to service

| 5
The five core components
Data Collection
Recommender Model
Recommendation
Post Processing
Online
Modules
User Interface

| 6
Outline
Developed Algorithms – keeping it simple
Practical Considerations – don’t look stupid
Implementation – how to scale a system
Evaluation – what is good enough
Evolution – what’s changed over time
Future Direction – the future’s bright the future’s is deep

| 8
Available Data
Implicit
User libraries (Mendeley)
User article interactions (Science Direct)
Content
Abstracts
Titles
References

| 9
Content Based
Similarity between what users
have read
Similarity in references
Collaborative Collaborative
Matrix Factorization
KNN
LDA
Potential Methods

| 10
User item interaction matrix
User base CF – (kNN)
https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/

| 11
Similarity between query users and other readers

| 12
Similarity between all users

| 13
Generating recommendations for user

| 14
• Ability to scale
• Matrix incredibly sparse
Why not Matrix Factorization

| 16
Explore/Exploit (Dithering)
Recommendations generated in batch
Users want an interactive experience
Slight shuffles give the impression of
freshness
Allow for the exploration of the list if only
a proportion shown
𝑠𝑐𝑜𝑟𝑒 𝑑𝑖𝑡ℎ𝑒𝑟𝑒𝑑 = log 𝑟𝑎𝑛𝑘 + 𝑁 0, log 𝜖
where 𝜀 =
∆ 𝑟𝑎𝑛𝑘
𝑟𝑎𝑛𝑘
and tipically 𝜀 ∈ [1.5,2]

| 17
Impression Discounting
• Experience deteriorates if exposed to the same information
• Push recommendations seen before down the list
Rank
Impressions

| 18
Impression Discounting
• Experience deteriorates if exposed to the same information
• Push recommendations seen before down the list
𝑠𝑐𝑜𝑟𝑒 𝑛𝑒𝑤 = scoreoriginal ∗ (w1 ∗ g impCount + w2 ∗ g lastSeen )
See Lee, P. et. al

| 19
Business Logic (Pre and Post Filtering)
Don’t show items they already have (bought, added, consumed)
Don’t feed the recommender positive feedback from recommender
Don’t recommend out of stock items
• A bad recommender has a cost
- Can be greater than not receiving a recommendation

| 21
Systems Architecture
Impression
Discounting
API
Front End
AWS
Dithering
Candidate Selection
Content
Based
Item2Item
CF
Online
Offline
Logs

| 23
System
• Which run generated the
recommendation
• What was served to the user
• How was the score modified
• What was removed from the
recommendations
User (Feedback loop)
• What was displayed
• What was clicked
• When were they served
• Where the recommendations
displayed
Logging
Used for both debugging and feeding information to recommender

| 25
• User to Item CF
• Impression Discounting
Mendeley – Desktop Application

| 26
Mendeley – Online
• Implicit – serves
recommendations based on
user libraries
• Recent Activity – based off
recent additions to a users
library
• Research Interests - based on
user generated tags
• Discipline – based on their
self identified discipline
Most Personalized
Least Personalized
See Hristakeva, M et. Al (2017)

| 27
• Remove carousels
• Focus on implicit
recommendations
• Fall back to content based
solution
Mendeley – Online

| 28
• Recommendation based of the
complete library of the user
• Don’t send the same
recommendations twice
Mendeley - Email

| 29
• Item to Item
• Take user reading history
• Get recommendations for each
item
• Interleave recommendations
• Don’t send same
recommendations twice
Science Direct - Email

| 30
Science Direct – Article Page
Item to Item
Dither
recommendations
every 30 minutes

| 32
Off-line Methodology
Train model Query
Ground
truth
Time, user interactions
Test

| 33
Off-line evaluation - Mendeley
From Hristakeva, M et. al

| 34
Science Direct – Item-to-item

| 35
• Infrastructure takes a long time
to build
• Need feedback from users to
learn
1. Generate recommendations
off-line
2. Send to users via email (A/A)
3. Modify method based on
feedback
4. Send second set of users split
into A/B buckets
Static Recommendations for quick learnings
Email to users
Modify
Recommender
Email to users

| 37
Learning to rank (LtR)
Currently only using implicit feedback
No content used
Use CF as candidate selection
Re-rank results based on learnt model
optimised for CtR
Use item and user features

| 38
Deep Learning
Use to learn more complex features
Use as features in LtR
Build on the existing framework developed
Use pre-trained models before developing own

| 39
Conclusion (Take Homes)
• Log EVERYTHING
• Start Simple
• Iterate quickly
• Get recommendations out quickly to learn
• Don’t look stupid
• CTR ≇ Off-line Evaluation

| 40
www.elsevier.com/rd-solutions
Thank you,
Book chapter being written based on the content in this presentation

| 41
References
Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S., & Jack, K. (2017). Building
recommender systems for scholarly information. the 1st Workshop (pp. 25–32). New York, New York,
USA: ACM. http://doi.org/10.1145/3057148.3057152
Rossetti, M., Stella, F., & Zanker, M. (2016). Contrasting Offline and Online Results when Evaluating
Recommendation Algorithms (pp. 31–34). Presented at the Proceedings of the 10th ACM Conference
on Recommender Systems, New York, NY, USA: ACM. http://doi.org/10.1145/2959100.2959176
Lee, P., Lakshmanan, L. V. S., Tiwari, M., & Shah, S. (2014). Modeling impression discounting in
large-scale recommender systems (pp. 1837–1846). Presented at the Proceedings of the ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, New York,
USA: ACM Press. http://doi.org/10.1145/2623330.2623356
Koren, Y. (2010). Collaborative filtering with temporal dynamics. Communications of the ACM, 53(4),
89–97. http://doi.org/10.1145/1721654.1721677

Building Recommender Systems - Mendeley and Science Direct

More Related Content

Similar to Building Recommender Systems - Mendeley and Science Direct (20)

Recently uploaded (20)

Building Recommender Systems - Mendeley and Science Direct