SlideShare a Scribd company logo
Deep learning:
the future of recommendations
Balázs Hidasi
Head of Data Mining and Research
Gravity meetup @ Startup Safary
April 21, 2016
Deep learning in the headlines
Deep learning in the background
• Life improving services
 Speech recognition
 Personal assistants (e.g. Siri,
Cortana)
 Computer vision, object
recognition
 Machine translation
 Chatbot technology
 Natural Language Processing
 Face recognition
 Self driving cars
• For fun
 Text generation
 Composing music
 Painting pictures
 Etc.
What is deep learning?
• A class of machine learning algorithms
 that use a cascade of multiple non-linear processing layers
 and complex model structures
 to learn different representations of the data in each layer
 where higher level features are derived from lower level
features
 to form a hierarchical representation.
Deep learning is not a new topic
• First deep network proposed in the 1970s
• More papers in the 80s and 90s
• Why now?
 Older research was not used widely in practice
 Applications were much more simplistic that today’s
Neural networks: a brief overview
Neurons, neural networks
• Neuron: rough abstraction of the human neuron
 Receives inputs (signals)
 Sum weighted inputs is big enough  signal
 Amplifiers and inhibitors
 Basic pattern recognition
• Neural network: neurons connected to one another
• Feedforward networks: neurons are organized into
layers
 Connections only between subsequent layers
𝑦
𝑥1
𝑥2
𝑥3
𝑥4
𝑓(. )
𝑖=1
𝑁
𝑤𝑖 𝑥𝑖 + 𝑏
𝑥1
𝑥2
𝑥3
ℎ1
1
ℎ2
1
ℎ3
1
ℎ1
2
ℎ2
2
Networks that big enough: go deep not wide
• Feedforward neural networks are universal
approximators
 Can imitate any function if they are big enough
 (Also needs enough in-out pairs to learn)
• What is big enough?
 Number of layers / neurons
 Theoretical „big enough” conditions massively overshoot
• Go deep, not wide
 The number of neurons required for good approximation is
polynomial in the input if the network is deep enough
 Otherwise it is exponential
Training neural networks
• Forward pass: get the current estimate of the target
o 𝑠𝑗
1
= 𝑖 𝑤𝑖,𝑗
1
𝑥𝑖 + 𝑏𝑗
1
; ℎ𝑗
1
= 𝑓 𝑠𝑗
1
o 𝑠 𝑘
2
= 𝑗 𝑤𝑗,𝑘
2
ℎ𝑗
1
+ 𝑏 𝑘
2
; ℎ 𝑘
2
= 𝑓 𝑠 𝑘
2
o …
o 𝑠𝑙
𝑂
= 𝑘 𝑤 𝑘,𝑙
𝑁+1
ℎ 𝑘
𝑁
+ 𝑏𝑙
𝑂
; 𝑦𝑙 = 𝑓 𝑠𝑙
𝑂
• Backward pass: correct weights to reduce error
 Gradient descentLayer Error Gradient
(w.r.t. weights between current and prev. layer)
Output Defined loss
(e.g. 𝐿 = 𝑖=1
𝑁 𝑜
𝑦𝑖 − 𝑦𝑖
2
)
𝜕𝐿
𝜕𝑤𝑗,𝑖
(𝑁+1)
=
𝜕𝐿
𝜕𝑦𝑖
∗
𝜕𝑦𝑖
𝜕𝑠𝑖
𝑂 ∗
𝜕𝑠𝑖
𝑂
𝜕𝑤𝑗,𝑖
𝑁+1 =
𝜕𝐿
𝜕𝑦𝑖
𝑓′
𝑠𝑖
𝑂
ℎ𝑗
𝑁
𝑁 𝑡ℎ
hidden
𝛿𝑖
𝑁
=
𝜕𝐿
𝜕𝑦𝑖
∗
𝜕𝑦𝑖
𝜕𝑠𝑖
𝑂
𝜕𝐿
𝜕𝑤 𝑘,𝑗
𝑁 =
𝑖
𝜕𝐿
𝜕𝑦𝑖
∗
𝜕𝑦𝑖
𝜕𝑠𝑖
𝑂 ∗
𝜕𝑠𝑖
𝑂
𝜕ℎ𝑗
𝑁 ∗
𝜕ℎ𝑗
𝑁
𝜕𝑠𝑗
𝑁 ∗
𝜕𝑠𝑗
𝑁
𝜕𝑤𝑗,𝑖
𝑁 =
𝑖
𝛿𝑖
𝑁
𝑤𝑖,𝑗
𝑁+1
𝑓′ 𝑠𝑗
𝑁
ℎ 𝑘
𝑁−1
(𝑁 −
𝛿𝑗
𝑁−1
=
𝑖
𝛿𝑖
𝑁
𝑤𝑖,𝑗
𝑁+1
𝑓′ 𝑠𝑗
𝑁
𝜕𝐿
𝜕𝑤𝑙,𝑘
𝑁−1 =
𝑗
𝛿𝑗
𝑁−1
𝑤𝑗,𝑘
𝑁
𝑓′ 𝑠 𝑘
𝑁−1
ℎ𝑙
𝑁−2
…
1 𝑠𝑡
hidden
𝛿 𝑘
1 𝜕𝐿
𝜕𝑤𝑖,𝑗
1 =
𝑘
𝛿 𝑘
1
𝑤 𝑘,𝑙
2
𝑓′
𝑠𝑗
1
𝑥𝑖
Challenges of training deep networks
• Saturation
• Vanishing gradients
• Overfitting
• Slowness of second order methods
• Slow convergence, stucks in local optima with first
order methods
• (Exploding gradients)
Why now?
Breakthroughs in research
• Saturation & vanishing gradients
 Layer-by-layer training (2006)
 Non-saturating activation functions, e.g. ReLU (2013)
• Overfitting
 Dropout (2014)
• Convergence problems
 Adagrad, Adadelta, Adam, RMSProp, etc.
Computational power
• Natural increase in computational power
• GP GPU technology
Intermission
Don’t give in to the HYPE
• Deep learning is impressive but
 deep learning is not true AI
o it may be a component of it when
and if AI is created
 deep learning is not how the human
brain works
 95% of machine learning tasks don’t
require deep learning
 deep learning requires a lot of
computational power
• Deep learning is a tool
 which is successful in certain,
previously very challenging domains
(speech recognition, computer
vision, NLP, etc.)
 that excels in pattern recognition
You are here
Deep learning for RecSys
From the Netflix prize...
• Netflix prize (2006-2009)
 Gave a huge push to recommender systems research
 Determined the direction of research for years
 Task:
o Some (User, Item, Rating) known triplets
o (User, Item) pairs with unknown rating
o Predict the missing ratings (1-5)
... to recommenders in practice
• Ratings  events [implicit feedback]
 Lots of services don’t allow for rating
 Majority of users don’t rate
 Monitored passively  preferences have to be infered
• Rating prediction  ranking [top N recommendations]
 All that matters is the relevancy of the top N items
 Rating prediction is biased
• User  session / situation [session-based / context-driven
recommendation]
 Users are not logged in, identification is unreliable
 Accounts used by multiple users
 Aim of the session (e.g. buy a good laptop)
 Similar behavior of different users in a situation, different behavior of the same
user in different situations
Challenges in RecSys
• Session modeling
 Most of the algorithms are personalized
 A few are item-to-item
o Recommends similar items
o Also used for session-based recommendations (industry de facto standard)
 There are no good session based solutions
• Incorporating factors that influence user clicks
 Users click based on what they see
o Title
o Product image
o Description
 and on their knowledge of the product
o Usually harder to model
o Except when the product is content (e.g. music)
Deep learning to the rescue – Session modeling
• Recurrent Neural Networks (RNN)
 Sequence modeling
 Hidden state: next state is based on the previous hidden state and the current input
 „Infinite” depth
 More sophisticated versions: GRU, LSTM
• Needs to be adapted to the recommendation task
• GRU4Rec:
 Session-parallel minibatch training for handling the large variance in session lengths
 Sampling the output for reasonable training times, without losing much accuracy
 Ranking loss for better item ranking
• Results: 15-30% improvement over item-to-item recommendations
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
RSC15 VIDEO
Recall@20
Item-kNN
GRU4Rec
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
RSC15 VIDEO
MRR@20
Item-kNN
GRU4Rec
Other uses of deep learning for recsys
• Incorporating content directly
 Music, images, video, text
 User influencing aspects of the items
 Direct content representation
• Context-state modeling from sensory data
 IoT devices
 Lot of sensory data
 Some missing and noise
 Infer context state and recommend accordingly
• Interactive recommenders using chatbots
• Personalized content generation
 Today’s news
 Images in personalized style with personalized content
• Etc...
There is work to be done
• DL + RecSys research: just started
 Last year:
o 0 long papers, 1 short paper and 1 poster that is loosely connected
 This year:
o 10+ submissions to RecSys in this topic
o DLRS 2016 workshop @ RecSys
• Open questions
 (More) Application areas
 Adaptations required for the recsys problem
 Scalability
 Best practices
 ...
Thanks for your attention!

More Related Content

PDF
CVPR2019読み会 "A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruc...
PDF
Déjà Vu: The Importance of Time and Causality in Recommender Systems
PDF
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
PDF
Artwork Personalization at Netflix
PDF
Missing values in recommender models
PDF
クラシックな機械学習入門:付録:よく使う線形代数の公式
PDF
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
PPTX
Netflix talk at ML Platform meetup Sep 2019
CVPR2019読み会 "A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruc...
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Artwork Personalization at Netflix
Missing values in recommender models
クラシックな機械学習入門:付録:よく使う線形代数の公式
[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...
Netflix talk at ML Platform meetup Sep 2019

What's hot (20)

PDF
Recommender Systems In Industry
PDF
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
PDF
Contextualization at Netflix
PDF
Incorporating Diversity in a Learning to Rank Recommender System
PDF
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [마페터 팀] : 고객 페르소나를 활용한 마케팅 전략 대시보드
PDF
Homepage Personalization at Spotify
PDF
Deep Learning for Personalized Search and Recommender Systems
PDF
効率的学習 / Efficient Training(メタサーベイ)
PDF
(2021年8月版)深層学習によるImage Classificaitonの発展
PDF
Engagement, metrics and "recommenders"
PDF
Recent advances in deep recommender systems
PDF
Transfer defect learning
PDF
Steffen Rendle, Research Scientist, Google at MLconf SF
PDF
グラフデータ分析 入門編
PDF
Artwork Personalization at Netflix Fernando Amat RecSys2018
PPTX
Neural Networks with Google TensorFlow
PDF
Sequential Decision Making in Recommendations
PDF
「深層学習」第6章 畳込みニューラルネット
PPTX
Self training with noisy student
PDF
Deep Learning for Recommender Systems
Recommender Systems In Industry
Hands on Explainable Recommender Systems with Knowledge Graphs @ RecSys22
Contextualization at Netflix
Incorporating Diversity in a Learning to Rank Recommender System
제 16회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [마페터 팀] : 고객 페르소나를 활용한 마케팅 전략 대시보드
Homepage Personalization at Spotify
Deep Learning for Personalized Search and Recommender Systems
効率的学習 / Efficient Training(メタサーベイ)
(2021年8月版)深層学習によるImage Classificaitonの発展
Engagement, metrics and "recommenders"
Recent advances in deep recommender systems
Transfer defect learning
Steffen Rendle, Research Scientist, Google at MLconf SF
グラフデータ分析 入門編
Artwork Personalization at Netflix Fernando Amat RecSys2018
Neural Networks with Google TensorFlow
Sequential Decision Making in Recommendations
「深層学習」第6章 畳込みニューラルネット
Self training with noisy student
Deep Learning for Recommender Systems
Ad

Similar to Deep learning: the future of recommendations (20)

PPTX
Deep learning to the rescue - solving long standing problems of recommender ...
PPTX
Deep Learning in Recommender Systems - RecSys Summer School 2017
PDF
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
PDF
Deep learning - a primer
PDF
Deep learning - a primer
PPTX
Introduction to deep learning
PDF
Big Data Malaysia - A Primer on Deep Learning
PDF
An Introduction to Deep Learning
PPTX
Deep learning tutorial 9/2019
PPTX
Deep Learning Tutorial
PPTX
Deep learning introduction
PPT
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PPTX
Introduction to deep learning
PDF
Introduction to Deep learning Models.pdf
PPTX
Deep learning
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PPTX
deep-learning-ppt-full-notes.pptx presen
PDF
Phx dl meetup
PDF
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
PDF
Deep Learning, an interactive introduction for NLP-ers
Deep learning to the rescue - solving long standing problems of recommender ...
Deep Learning in Recommender Systems - RecSys Summer School 2017
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
Deep learning - a primer
Deep learning - a primer
Introduction to deep learning
Big Data Malaysia - A Primer on Deep Learning
An Introduction to Deep Learning
Deep learning tutorial 9/2019
Deep Learning Tutorial
Deep learning introduction
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Introduction to deep learning
Introduction to Deep learning Models.pdf
Deep learning
DEF CON 24 - Clarence Chio - machine duping 101
deep-learning-ppt-full-notes.pptx presen
Phx dl meetup
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
Deep Learning, an interactive introduction for NLP-ers
Ad

More from Balázs Hidasi (15)

PDF
Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
PDF
The Effect of Third Party Implementations on Reproducibility
PPTX
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
PPTX
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
PPTX
Context aware factorization methods for implicit feedback based recommendatio...
PDF
Context-aware preference modeling with factorization
PPTX
Approximate modeling of continuous context in factorization algorithms (CaRR1...
PPTX
Utilizing additional information in factorization methods (research overview,...
PPT
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
PPTX
Context-aware similarities within the factorization framework (CaRR 2013 pres...
PPTX
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
PPTX
Initialization of matrix factorization (CaRR 2012 presentation)
PPT
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
PPTX
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
PPTX
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
The Effect of Third Party Implementations on Reproducibility
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Context aware factorization methods for implicit feedback based recommendatio...
Context-aware preference modeling with factorization
Approximate modeling of continuous context in factorization algorithms (CaRR1...
Utilizing additional information in factorization methods (research overview,...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Context-aware similarities within the factorization framework (CaRR 2013 pres...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
Initialization of matrix factorization (CaRR 2012 presentation)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence

Deep learning: the future of recommendations

  • 1. Deep learning: the future of recommendations Balázs Hidasi Head of Data Mining and Research Gravity meetup @ Startup Safary April 21, 2016
  • 2. Deep learning in the headlines
  • 3. Deep learning in the background • Life improving services  Speech recognition  Personal assistants (e.g. Siri, Cortana)  Computer vision, object recognition  Machine translation  Chatbot technology  Natural Language Processing  Face recognition  Self driving cars • For fun  Text generation  Composing music  Painting pictures  Etc.
  • 4. What is deep learning? • A class of machine learning algorithms  that use a cascade of multiple non-linear processing layers  and complex model structures  to learn different representations of the data in each layer  where higher level features are derived from lower level features  to form a hierarchical representation.
  • 5. Deep learning is not a new topic • First deep network proposed in the 1970s • More papers in the 80s and 90s • Why now?  Older research was not used widely in practice  Applications were much more simplistic that today’s
  • 6. Neural networks: a brief overview
  • 7. Neurons, neural networks • Neuron: rough abstraction of the human neuron  Receives inputs (signals)  Sum weighted inputs is big enough  signal  Amplifiers and inhibitors  Basic pattern recognition • Neural network: neurons connected to one another • Feedforward networks: neurons are organized into layers  Connections only between subsequent layers 𝑦 𝑥1 𝑥2 𝑥3 𝑥4 𝑓(. ) 𝑖=1 𝑁 𝑤𝑖 𝑥𝑖 + 𝑏 𝑥1 𝑥2 𝑥3 ℎ1 1 ℎ2 1 ℎ3 1 ℎ1 2 ℎ2 2
  • 8. Networks that big enough: go deep not wide • Feedforward neural networks are universal approximators  Can imitate any function if they are big enough  (Also needs enough in-out pairs to learn) • What is big enough?  Number of layers / neurons  Theoretical „big enough” conditions massively overshoot • Go deep, not wide  The number of neurons required for good approximation is polynomial in the input if the network is deep enough  Otherwise it is exponential
  • 9. Training neural networks • Forward pass: get the current estimate of the target o 𝑠𝑗 1 = 𝑖 𝑤𝑖,𝑗 1 𝑥𝑖 + 𝑏𝑗 1 ; ℎ𝑗 1 = 𝑓 𝑠𝑗 1 o 𝑠 𝑘 2 = 𝑗 𝑤𝑗,𝑘 2 ℎ𝑗 1 + 𝑏 𝑘 2 ; ℎ 𝑘 2 = 𝑓 𝑠 𝑘 2 o … o 𝑠𝑙 𝑂 = 𝑘 𝑤 𝑘,𝑙 𝑁+1 ℎ 𝑘 𝑁 + 𝑏𝑙 𝑂 ; 𝑦𝑙 = 𝑓 𝑠𝑙 𝑂 • Backward pass: correct weights to reduce error  Gradient descentLayer Error Gradient (w.r.t. weights between current and prev. layer) Output Defined loss (e.g. 𝐿 = 𝑖=1 𝑁 𝑜 𝑦𝑖 − 𝑦𝑖 2 ) 𝜕𝐿 𝜕𝑤𝑗,𝑖 (𝑁+1) = 𝜕𝐿 𝜕𝑦𝑖 ∗ 𝜕𝑦𝑖 𝜕𝑠𝑖 𝑂 ∗ 𝜕𝑠𝑖 𝑂 𝜕𝑤𝑗,𝑖 𝑁+1 = 𝜕𝐿 𝜕𝑦𝑖 𝑓′ 𝑠𝑖 𝑂 ℎ𝑗 𝑁 𝑁 𝑡ℎ hidden 𝛿𝑖 𝑁 = 𝜕𝐿 𝜕𝑦𝑖 ∗ 𝜕𝑦𝑖 𝜕𝑠𝑖 𝑂 𝜕𝐿 𝜕𝑤 𝑘,𝑗 𝑁 = 𝑖 𝜕𝐿 𝜕𝑦𝑖 ∗ 𝜕𝑦𝑖 𝜕𝑠𝑖 𝑂 ∗ 𝜕𝑠𝑖 𝑂 𝜕ℎ𝑗 𝑁 ∗ 𝜕ℎ𝑗 𝑁 𝜕𝑠𝑗 𝑁 ∗ 𝜕𝑠𝑗 𝑁 𝜕𝑤𝑗,𝑖 𝑁 = 𝑖 𝛿𝑖 𝑁 𝑤𝑖,𝑗 𝑁+1 𝑓′ 𝑠𝑗 𝑁 ℎ 𝑘 𝑁−1 (𝑁 − 𝛿𝑗 𝑁−1 = 𝑖 𝛿𝑖 𝑁 𝑤𝑖,𝑗 𝑁+1 𝑓′ 𝑠𝑗 𝑁 𝜕𝐿 𝜕𝑤𝑙,𝑘 𝑁−1 = 𝑗 𝛿𝑗 𝑁−1 𝑤𝑗,𝑘 𝑁 𝑓′ 𝑠 𝑘 𝑁−1 ℎ𝑙 𝑁−2 … 1 𝑠𝑡 hidden 𝛿 𝑘 1 𝜕𝐿 𝜕𝑤𝑖,𝑗 1 = 𝑘 𝛿 𝑘 1 𝑤 𝑘,𝑙 2 𝑓′ 𝑠𝑗 1 𝑥𝑖
  • 10. Challenges of training deep networks • Saturation • Vanishing gradients • Overfitting • Slowness of second order methods • Slow convergence, stucks in local optima with first order methods • (Exploding gradients)
  • 12. Breakthroughs in research • Saturation & vanishing gradients  Layer-by-layer training (2006)  Non-saturating activation functions, e.g. ReLU (2013) • Overfitting  Dropout (2014) • Convergence problems  Adagrad, Adadelta, Adam, RMSProp, etc.
  • 13. Computational power • Natural increase in computational power • GP GPU technology
  • 15. Don’t give in to the HYPE • Deep learning is impressive but  deep learning is not true AI o it may be a component of it when and if AI is created  deep learning is not how the human brain works  95% of machine learning tasks don’t require deep learning  deep learning requires a lot of computational power • Deep learning is a tool  which is successful in certain, previously very challenging domains (speech recognition, computer vision, NLP, etc.)  that excels in pattern recognition You are here
  • 17. From the Netflix prize... • Netflix prize (2006-2009)  Gave a huge push to recommender systems research  Determined the direction of research for years  Task: o Some (User, Item, Rating) known triplets o (User, Item) pairs with unknown rating o Predict the missing ratings (1-5)
  • 18. ... to recommenders in practice • Ratings  events [implicit feedback]  Lots of services don’t allow for rating  Majority of users don’t rate  Monitored passively  preferences have to be infered • Rating prediction  ranking [top N recommendations]  All that matters is the relevancy of the top N items  Rating prediction is biased • User  session / situation [session-based / context-driven recommendation]  Users are not logged in, identification is unreliable  Accounts used by multiple users  Aim of the session (e.g. buy a good laptop)  Similar behavior of different users in a situation, different behavior of the same user in different situations
  • 19. Challenges in RecSys • Session modeling  Most of the algorithms are personalized  A few are item-to-item o Recommends similar items o Also used for session-based recommendations (industry de facto standard)  There are no good session based solutions • Incorporating factors that influence user clicks  Users click based on what they see o Title o Product image o Description  and on their knowledge of the product o Usually harder to model o Except when the product is content (e.g. music)
  • 20. Deep learning to the rescue – Session modeling • Recurrent Neural Networks (RNN)  Sequence modeling  Hidden state: next state is based on the previous hidden state and the current input  „Infinite” depth  More sophisticated versions: GRU, LSTM • Needs to be adapted to the recommendation task • GRU4Rec:  Session-parallel minibatch training for handling the large variance in session lengths  Sampling the output for reasonable training times, without losing much accuracy  Ranking loss for better item ranking • Results: 15-30% improvement over item-to-item recommendations 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RSC15 VIDEO Recall@20 Item-kNN GRU4Rec 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 RSC15 VIDEO MRR@20 Item-kNN GRU4Rec
  • 21. Other uses of deep learning for recsys • Incorporating content directly  Music, images, video, text  User influencing aspects of the items  Direct content representation • Context-state modeling from sensory data  IoT devices  Lot of sensory data  Some missing and noise  Infer context state and recommend accordingly • Interactive recommenders using chatbots • Personalized content generation  Today’s news  Images in personalized style with personalized content • Etc...
  • 22. There is work to be done • DL + RecSys research: just started  Last year: o 0 long papers, 1 short paper and 1 poster that is loosely connected  This year: o 10+ submissions to RecSys in this topic o DLRS 2016 workshop @ RecSys • Open questions  (More) Application areas  Adaptations required for the recsys problem  Scalability  Best practices  ...
  • 23. Thanks for your attention!