Deep learning: the future of recommendations

Deep learning:
the future of recommendations
Balázs Hidasi
Head of Data Mining and Research
Gravity meetup @ Startup Safary
April 21, 2016

Deep learning in the headlines

Deep learning in the background
• Life improving services
 Speech recognition
 Personal assistants (e.g. Siri,
Cortana)
 Computer vision, object
recognition
 Machine translation
 Chatbot technology
 Natural Language Processing
 Face recognition
 Self driving cars
• For fun
 Text generation
 Composing music
 Painting pictures
 Etc.

What is deep learning?
• A class of machine learning algorithms
 that use a cascade of multiple non-linear processing layers
 and complex model structures
 to learn different representations of the data in each layer
 where higher level features are derived from lower level
features
 to form a hierarchical representation.

Deep learning is not a new topic
• First deep network proposed in the 1970s
• More papers in the 80s and 90s
• Why now?
 Older research was not used widely in practice
 Applications were much more simplistic that today’s

Neural networks: a brief overview

Neurons, neural networks
• Neuron: rough abstraction of the human neuron
 Receives inputs (signals)
 Sum weighted inputs is big enough  signal
 Amplifiers and inhibitors
 Basic pattern recognition
• Neural network: neurons connected to one another
• Feedforward networks: neurons are organized into
layers
 Connections only between subsequent layers
𝑦
𝑥1
𝑥2
𝑥3
𝑥4
𝑓(. )
𝑖=1
𝑁
𝑤𝑖 𝑥𝑖 + 𝑏
𝑥1
𝑥2
𝑥3
ℎ1
1
ℎ2
1
ℎ3
1
ℎ1
2
ℎ2
2

Networks that big enough: go deep not wide
• Feedforward neural networks are universal
approximators
 Can imitate any function if they are big enough
 (Also needs enough in-out pairs to learn)
• What is big enough?
 Number of layers / neurons
 Theoretical „big enough” conditions massively overshoot
• Go deep, not wide
 The number of neurons required for good approximation is
polynomial in the input if the network is deep enough
 Otherwise it is exponential

Training neural networks
• Forward pass: get the current estimate of the target
o 𝑠𝑗
1
= 𝑖 𝑤𝑖,𝑗
1
𝑥𝑖 + 𝑏𝑗
1
; ℎ𝑗
1
= 𝑓 𝑠𝑗
1
o 𝑠 𝑘
2
= 𝑗 𝑤𝑗,𝑘
2
ℎ𝑗
1
+ 𝑏 𝑘
2
; ℎ 𝑘
2
= 𝑓 𝑠 𝑘
2
o …
o 𝑠𝑙
𝑂
= 𝑘 𝑤 𝑘,𝑙
𝑁+1
ℎ 𝑘
𝑁
+ 𝑏𝑙
𝑂
; 𝑦𝑙 = 𝑓 𝑠𝑙
𝑂
• Backward pass: correct weights to reduce error
 Gradient descentLayer Error Gradient
(w.r.t. weights between current and prev. layer)
Output Defined loss
(e.g. 𝐿 = 𝑖=1
𝑁 𝑜
𝑦𝑖 − 𝑦𝑖
2
)
𝜕𝐿
𝜕𝑤𝑗,𝑖
(𝑁+1)
=
𝜕𝐿
𝜕𝑦𝑖
∗
𝜕𝑦𝑖
𝜕𝑠𝑖
𝑂 ∗
𝜕𝑠𝑖
𝑂
𝜕𝑤𝑗,𝑖
𝑁+1 =
𝜕𝐿
𝜕𝑦𝑖
𝑓′
𝑠𝑖
𝑂
ℎ𝑗
𝑁
𝑁 𝑡ℎ
hidden
𝛿𝑖
𝑁
=
𝜕𝐿
𝜕𝑦𝑖
∗
𝜕𝑦𝑖
𝜕𝑠𝑖
𝑂
𝜕𝐿
𝜕𝑤 𝑘,𝑗
𝑁 =
𝑖
𝜕𝐿
𝜕𝑦𝑖
∗
𝜕𝑦𝑖
𝜕𝑠𝑖
𝑂 ∗
𝜕𝑠𝑖
𝑂
𝜕ℎ𝑗
𝑁 ∗
𝜕ℎ𝑗
𝑁
𝜕𝑠𝑗
𝑁 ∗
𝜕𝑠𝑗
𝑁
𝜕𝑤𝑗,𝑖
𝑁 =
𝑖
𝛿𝑖
𝑁
𝑤𝑖,𝑗
𝑁+1
𝑓′ 𝑠𝑗
𝑁
ℎ 𝑘
𝑁−1
(𝑁 −
𝛿𝑗
𝑁−1
=
𝑖
𝛿𝑖
𝑁
𝑤𝑖,𝑗
𝑁+1
𝑓′ 𝑠𝑗
𝑁
𝜕𝐿
𝜕𝑤𝑙,𝑘
𝑁−1 =
𝑗
𝛿𝑗
𝑁−1
𝑤𝑗,𝑘
𝑁
𝑓′ 𝑠 𝑘
𝑁−1
ℎ𝑙
𝑁−2
…
1 𝑠𝑡
hidden
𝛿 𝑘
1 𝜕𝐿
𝜕𝑤𝑖,𝑗
1 =
𝑘
𝛿 𝑘
1
𝑤 𝑘,𝑙
2
𝑓′
𝑠𝑗
1
𝑥𝑖

Challenges of training deep networks
• Saturation
• Vanishing gradients
• Overfitting
• Slowness of second order methods
• Slow convergence, stucks in local optima with first
order methods
• (Exploding gradients)

Breakthroughs in research
• Saturation & vanishing gradients
 Layer-by-layer training (2006)
 Non-saturating activation functions, e.g. ReLU (2013)
• Overfitting
 Dropout (2014)
• Convergence problems
 Adagrad, Adadelta, Adam, RMSProp, etc.

Computational power
• Natural increase in computational power
• GP GPU technology

Don’t give in to the HYPE
• Deep learning is impressive but
 deep learning is not true AI
o it may be a component of it when
and if AI is created
 deep learning is not how the human
brain works
 95% of machine learning tasks don’t
require deep learning
 deep learning requires a lot of
computational power
• Deep learning is a tool
 which is successful in certain,
previously very challenging domains
(speech recognition, computer
vision, NLP, etc.)
 that excels in pattern recognition
You are here

From the Netflix prize...
• Netflix prize (2006-2009)
 Gave a huge push to recommender systems research
 Determined the direction of research for years
 Task:
o Some (User, Item, Rating) known triplets
o (User, Item) pairs with unknown rating
o Predict the missing ratings (1-5)

... to recommenders in practice
• Ratings  events [implicit feedback]
 Lots of services don’t allow for rating
 Majority of users don’t rate
 Monitored passively  preferences have to be infered
• Rating prediction  ranking [top N recommendations]
 All that matters is the relevancy of the top N items
 Rating prediction is biased
• User  session / situation [session-based / context-driven
recommendation]
 Users are not logged in, identification is unreliable
 Accounts used by multiple users
 Aim of the session (e.g. buy a good laptop)
 Similar behavior of different users in a situation, different behavior of the same
user in different situations

Challenges in RecSys
• Session modeling
 Most of the algorithms are personalized
 A few are item-to-item
o Recommends similar items
o Also used for session-based recommendations (industry de facto standard)
 There are no good session based solutions
• Incorporating factors that influence user clicks
 Users click based on what they see
o Title
o Product image
o Description
 and on their knowledge of the product
o Usually harder to model
o Except when the product is content (e.g. music)

Deep learning to the rescue – Session modeling
• Recurrent Neural Networks (RNN)
 Sequence modeling
 Hidden state: next state is based on the previous hidden state and the current input
 „Infinite” depth
 More sophisticated versions: GRU, LSTM
• Needs to be adapted to the recommendation task
• GRU4Rec:
 Session-parallel minibatch training for handling the large variance in session lengths
 Sampling the output for reasonable training times, without losing much accuracy
 Ranking loss for better item ranking
• Results: 15-30% improvement over item-to-item recommendations
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
RSC15 VIDEO
Recall@20
Item-kNN
GRU4Rec
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
RSC15 VIDEO
MRR@20
Item-kNN
GRU4Rec

Other uses of deep learning for recsys
• Incorporating content directly
 Music, images, video, text
 User influencing aspects of the items
 Direct content representation
• Context-state modeling from sensory data
 IoT devices
 Lot of sensory data
 Some missing and noise
 Infer context state and recommend accordingly
• Interactive recommenders using chatbots
• Personalized content generation
 Today’s news
 Images in personalized style with personalized content
• Etc...

There is work to be done
• DL + RecSys research: just started
 Last year:
o 0 long papers, 1 short paper and 1 poster that is loosely connected
 This year:
o 10+ submissions to RecSys in this topic
o DLRS 2016 workshop @ RecSys
• Open questions
 (More) Application areas
 Adaptations required for the recsys problem
 Scalability
 Best practices
 ...

Deep learning: the future of recommendations

More Related Content

What's hot (20)

Similar to Deep learning: the future of recommendations (20)

More from Balázs Hidasi (15)

Recently uploaded (20)

Deep learning: the future of recommendations