SlideShare a Scribd company logo
FEBRUARY 22, 2018, WARSAW
Factorization Machines for building
Recommender System
Paweł Łagodziński, SAS Institute Poland
FEBRUARY 22, 2018, WARSAW
Recommender Systems
• Primary goal – recommend most relevant items to users
Plenty of success stories:
• music, movies
• retail products
• ad verbs
• news
• query results
• social links
Users
Items
Context
Recommender System
FEBRUARY 22, 2018, WARSAW
Recommender Systems
• Key challenges
• sparse data
• extremely large data
• cold start
• Other issues
• different ratings types
• difficult performance evolution
FEBRUARY 22, 2018, WARSAW
Regression models
• Linear regression
• Polynomial regression (2-d)
– input vector with p predictors
y
– predicted rating
– number of parameters
O( ҧ𝑝) – learning complexity
1+p
– observed rating
1+p+p2 – number of parameters
O( ҧ𝑝2
) – learning complexity
FEBRUARY 22, 2018, WARSAW
Regression models
• Pros
• Combines real-value and categorical features
• Useful for regression and classification problems
• Fast learning for linear model
• Cons
• Linear regression does not include user-item interactions
• Polynomial regression model includes, but only in theory for sparse data
• For sparse data pairwise interactions are very rarely observed, in consequences
parameters cannot be estimated directly from the data
FEBRUARY 22, 2018, WARSAW
Laten factor models - matrix factorization
𝑌 ∈ ℝ|𝑈|×|𝐼| matrix of observed ratings
i1 i2 i3 i4
u1 2 3 1 ?
u2 ? 5 ? 1
u3 ? ? 4 1
u4 2 3 ? 2
I - items
U-users
• If matrix Y is low-rank it can be approximated with ෠𝑌
𝑌 ≈ ෠𝑌
where k is the rank of matrix ෡𝒀
(the size of latent factors vector)෠𝑌 = 𝑊𝐻 𝑇
, W ∈ ℝ 𝑈 ×𝑘, H ∈ ℝ|𝐼|×𝑘
Rating
predictions
ො𝑦 𝑢𝑖 = ෍
𝑗=1
𝑘
𝑤 𝑢𝑗 ∙ ℎ𝑗𝑖 =< 𝒘 𝑢 𝒉𝑖 >
FEBRUARY 22, 2018, WARSAW
Nero
JuliusCaesar
Cleopatra
SleeplessinSeatle
PrettyWomen
Casablanca
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 -1 0 0 0
5 0 0 -1 0 0 0
6 0 0 1 0 0 0
7 0 0 -1 0 0 0
≈
History
Romance
1 1 0
2 1 0
3 1 0
4 1 1
5 -1 1
6 -1 1
7 -1 1
Nero
JuliusCaesar
Cleopatra
SleeplessinSeatle
PrettyWomen
Casablanca
History 1 1 1 0 0 0
Romance 0 0 1 1 1 1
Rating: -1- dislike, 0 – neutral, 1 - like
History
Romance
Both x
Laten factor models - matrix factorization
• correlations -> lower rank matrix
• strong correlations -> works for sparse data
after C.C. Aggarwal [1]
Nero
JuliusCaesar
Cleopatra
SleeplessinSeatle
PrettyWomen
Casablanca
1 1 1 1 0 0 0
2 1 1 1 0 0 0
3 1 1 1 0 0 0
4 1 1 1 1 1 1
5 -1 -1 -1 1 1 1
6 -1 -1 1 1 1 1
7 -1 -1 -1 1 1 1
• Intuition behind matrix factorization with low (2) rank matrix
𝑊 𝐻 𝑇𝑌 𝑌 − ෠𝑌
FEBRUARY 22, 2018, WARSAW
Matrix factorization based models
• Many specific variants of factorization models were proposed
• Examples:
• MF
• Time Senstive Factorization Models
• User-item matrix factorization
• SVD++
• Factorized kNN
• timeSVD
• timeTF
• Tensore Factorization Models
• Sequential Models
• PARAFAC
• PITF
• FMC
• FPMC
(after S. Randal, MLConf, 2014)
FEBRUARY 22, 2018, WARSAW
Matrix factorization based models
• Pros
• Enable modelling interactions (user-item-…) even if such events are not
observed in data
• Effective on sparse data
• Cons
• Learning algorithms are tailored to each specific model
• In result, it is hard to adjust model to the problem under consideration
FEBRUARY 22, 2018, WARSAW
Factorization Machines
• Concept generalizing regression and factorization models
• Introduced in 2010 (S. Randel [2])
• Very good performance in data science challenges (after S. Randal, MLConf, 2014):
• Click prediction - KDDCup 2012: Track 2 (3rd place), Criteon Display Advertising Challenge (1st place)
• Rating prediction – Netlix prize, Movielens, KDDCup 2011, EMI Music Data Science Hackathon (2nd
place)
• Social link prediction (social media) - KDDCup 2012: Track 1 (2nd place)
• Recommend given name - ECML/PKDD Discovery Challenge 2013 (1st place on-line track, 2nd place
off-line track)
• Predicting result of next exam question - KDDCup 2010, Grockit Challenge (1st place)
• Prediction of auction sale price –Kaggle, Blue Book for Bulldozers (1st place)
FEBRUARY 22, 2018, WARSAW
Factorization Machines - model
• Factorization Machines (2nd degree)
– input vector with p predictors
y – observed rating
– predicted rating
1+p+kp – number of parameters
O( ҧ𝑝𝑘) – learning complexity (lower than ҧ𝑝2 for 2-d polynomial regression)
• FM of higher degrees possible but not used in practice
• Parameters can be estimated with use of universal optimization
methods, Stochastic Gradient Decent for instance
FEBRUARY 22, 2018, WARSAW
Factorization Machines
• Features engineering enables great expressiveness of FM model
• Matrix / tensor data can be easily transformed into features vector
Input Y
i1 i2 i3 i4
u1 2 3 1 ?
u2 ? 5 ? 1
u3 ? ? 4 1
User(U)
Item (I)
U I y
1 u1 i1 2
2 u1 i2 3
3 u1 i3 1
4 u2 i2 5
5 u2 i4 1
6 u3 i3 4
7 u3 i4 1
target
y
x1 1 0 0 1 0 0 0 2 y1
x2 1 0 0 0 1 0 0 3 y2
x3 1 0 0 0 0 1 0 1 y3
x4 0 1 0 0 1 0 0 5 y4
x5 0 1 0 0 0 0 1 1 y5
x6 0 0 1 0 0 1 0 4 y6
x7 0 0 1 0 0 0 1 1 y7
u1 u2 u3 i1 i2 i3 i4
Users Items
x
Features vector
(one-hot
encoding)
FEBRUARY 22, 2018, WARSAW
Factorization Machines
• Applying FM model
target
y
x1 1 0 0 1 0 0 0 2 y1
x2 1 0 0 0 1 0 0 3 y2
x3 1 0 0 0 0 1 0 1 y3
x4 0 1 0 0 1 0 0 5 y4
x5 0 1 0 0 0 0 1 1 y5
x6 0 0 1 0 0 1 0 4 y6
x7 0 0 1 0 0 0 1 1 y7
u1 u2 u3 i1 i2 i3 i4
Users Items
x
Features vector
FM is equal to combination of MF (interaction
user-item) and biases for user and item
FEBRUARY 22, 2018, WARSAW
Factorization Machines
• Adding context categorical information gives other MF model
FM is equivalent to PITF model, one of
extension of MF model
ො𝑦 𝑢𝑖𝑐 = 𝑤0 + 𝑤 𝑢 + 𝑤𝑖 + 𝑤𝑐 +< 𝒗 𝑢 𝒗𝑖 >+< 𝒗 𝑢 𝒗 𝑐 >+< 𝒗𝑖 𝒗 𝑐 >
target
y
x1 1 0 0 1 0 0 0 0 0 1 2 y1
x2 1 0 0 0 1 0 0 1 0 0 3 y2
x3 1 0 0 0 0 1 0 1 0 0 1 y3
x4 0 1 0 0 1 0 0 0 1 0 5 y4
x5 0 1 0 0 0 0 1 1 0 0 1 y5
x6 0 0 1 0 0 1 0 0 1 0 4 y6
x7 0 0 1 0 0 0 1 0 0 1 1 y7
u1 u2 u3 i1 i2 i3 i4 c1 c2 c3
Context
Features vector
x
Users Items
FEBRUARY 22, 2018, WARSAW
Factorization Machines
• Factorization Machines offers combination of regression and factorization models
• Low rank matrix approximation enables estimation of unobserved interactions
• Effective for sparse and extremely sparse data
• Flexible through features engineering
• One of the most disruptive algorithms of recent years
FEBRUARY 22, 2018, WARSAW
FactMac – Factorization Machines in SAS Viya
FEBRUARY 22, 2018, WARSAW
FactMac – Factorization Machines in SAS Viya
• 2-nd degree Factorization Machines
• Learning with „HOGWILD!”
• Hyper-parameters optimization with Grid Search, Genetic Algorithm
• Different deployment options
• SAS Viya
• In-database
• In-stream (SAS ESP)
• Considered for the next releases
• Field-Aware Factorization Machines, FFM [3] – better predictions than FM
• Warm-up restart – faster retraining
• Other loss functions
2
3
1
1
2
3
FEBRUARY 22, 2018, WARSAW
Shot Recommender System for NBA Coaches
• Application of Factorization Machines in SAS Viya to sport data
• Presented during 2016 KDD Large-Scale Sports Analytics [5]
• Goal – recommend shot types for players
• Data - recorded shots taken during the 2015-2016 NBA games
• shots for 420 players
• for shot: player, action type, shot zone range, shot zone area, result,
and other characteristics.
• Model for shot prediction (FM 2-d, k=25, p = 404), log-odds as a target
?
Image source: Wikipedia
FEBRUARY 22, 2018, WARSAW
Shot Recommender System for NBA Coaches
• Biases can be very informative
FEBRUARY 22, 2018, WARSAW
Shot Recommender System for NBA Coaches
• Latent factors analysis
FEBRUARY 22, 2018, WARSAW
Shot Recommender System for NBA Coaches
• Shot recommendations (plogit):
FEBRUARY 22, 2018, WARSAW
Shot Recommender System for NBA Coaches
• Demonstration of the concept on NCAA* basketball data
• The National Collegiate Athletic Association (NCAA) is a non-profit organization which regulates athletes
of 1,281 institutions, conferences, and individuals (source: wikipedia).
• Recent news for Kaggle’rs,
new competation launched
2 days ago
FEBRUARY 22, 2018, WARSAW
Where to use Factorization Machines?
• Rule of Thumb
• design matrix is very sparse (mostly missing data, 90% missing or more)
• the cardinality of the nominal variables is very high and each level of the nominal variables
has only a few representatives in the data
• you have good reason to believe the design matrix has low rank
• Findings in [3], seems to confirm above. FFM was working better for CTR prediction but no so good for more
dense data (web phishing, census income)
• Our (local SAS office) experiments, still in progress, showed that FM even for not very sparse data
can produce robust set of baseline predictions. More specialized and tailored models can
outperform FM but requires more effort in development
• FactMac was tested for SAS Customer Intelligence 360 solution with good results.
FEBRUARY 22, 2018, WARSAW
Thank You !
FEBRUARY 22, 2018, WARSAW
References
• [1] Charu C. Aggrwal. „Recommender Systems, The Textbook”. Springer. 2016.
• [2] Steffen Rendle. “Factorization Machines.” In Proceedings of the 10th IEEE
International Conference on Data Mining (ICDM). Piscataway, NJ: Institute of Electrical
And Electronics Engineers. 2010.
• [3] Yuchin Juan et al., „Field-aware Factorization Machines for CTR Prediction”. RecSys
'16 Proceedings of the 10th ACM Conference on Recommender Systems. 2016.
• [4] Jason Lee et al., „Practical Large-Scale Optimization for Max-Norm Regularization”,
Advances in Neural Information Processing Systems 23 (NIPS). 2010.
• [5] Reymond E. Wrigt et al. „Shot Recommender System for NBA Coaches”. KDD. 2016
FEBRUARY 22, 2018, WARSAW

More Related Content

PDF
Factorization Machines and Applications in Recommender Systems
PDF
assembler-ppt.pdf
PDF
Recursive Neural Networks
PPTX
9. ES6 | Let And Const | TypeScript | JavaScript
PDF
High Performance Storage Devices in the Linux Kernel
PDF
The Forward-Forward Algorithm
PDF
Big Bird - Transformers for Longer Sequences
PPTX
Looping statement in vb.net
Factorization Machines and Applications in Recommender Systems
assembler-ppt.pdf
Recursive Neural Networks
9. ES6 | Let And Const | TypeScript | JavaScript
High Performance Storage Devices in the Linux Kernel
The Forward-Forward Algorithm
Big Bird - Transformers for Longer Sequences
Looping statement in vb.net

What's hot (20)

PDF
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
PPTX
Expressões Regulares - Final
PPT
Introduction To Map Reduce
PDF
Digital Forensics and Incident Response (DFIR) using Docker Containers
PDF
Reitit - Clojure/North 2019
PPTX
Data structures
PPTX
closure properties of regular language.pptx
PDF
Haskell study 3
PPTX
Deep learning lecture - part 1 (basics, CNN)
PDF
Introduction to Rust
PPT
Lexical analyzer
PPT
relAlgebra.ppt
PDF
Clone detection in Python
PPT
Linux Memory Management
PPTX
Convolutional Neural Network and RNN for OCR problem.
PDF
Computer graphics godse (technical publications)
PPT
Linux file system
PPT
Programming Paradigms
PDF
Introduction to TensorFlow
PDF
Making Linux do Hard Real-time
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Expressões Regulares - Final
Introduction To Map Reduce
Digital Forensics and Incident Response (DFIR) using Docker Containers
Reitit - Clojure/North 2019
Data structures
closure properties of regular language.pptx
Haskell study 3
Deep learning lecture - part 1 (basics, CNN)
Introduction to Rust
Lexical analyzer
relAlgebra.ppt
Clone detection in Python
Linux Memory Management
Convolutional Neural Network and RNN for OCR problem.
Computer graphics godse (technical publications)
Linux file system
Programming Paradigms
Introduction to TensorFlow
Making Linux do Hard Real-time
Ad

Similar to The Factorization Machines algorithm for building recommendation system - Paweł Łagodziński, SAS Institute (20)

PDF
Dsdt meetup oct24
PDF
DSDT Meetup October 2017
PDF
Dsdt meetup oct24
PPTX
R at Microsoft
PDF
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
PDF
The Future of Design-to-Deliver Supply Chains - 56560
PDF
Machine Learning and AI: Core Methods and Applications
PPTX
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
PDF
WSO2 Machine Learner - Product Overview
PDF
[Tutorial] building machine learning models for predictive maintenance applic...
PDF
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
PPTX
DataMass Summit - Machine Learning for Big Data in SQL Server
PDF
BigData and Beyond
PDF
Data Science in Retail-as-a-Service (RaaS)
PPTX
RS in the context of Big Data-v4
PDF
Open Source Innovations in the MapR Ecosystem Pack 2.0
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
PPT
Scalable Machine Learning: The Role of Stratified Data Sharding
PPTX
Skillwise Big data
PDF
Resume xiaodan(vinci)
Dsdt meetup oct24
DSDT Meetup October 2017
Dsdt meetup oct24
R at Microsoft
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
The Future of Design-to-Deliver Supply Chains - 56560
Machine Learning and AI: Core Methods and Applications
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
WSO2 Machine Learner - Product Overview
[Tutorial] building machine learning models for predictive maintenance applic...
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
DataMass Summit - Machine Learning for Big Data in SQL Server
BigData and Beyond
Data Science in Retail-as-a-Service (RaaS)
RS in the context of Big Data-v4
Open Source Innovations in the MapR Ecosystem Pack 2.0
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Scalable Machine Learning: The Role of Stratified Data Sharding
Skillwise Big data
Resume xiaodan(vinci)
Ad

More from Evention (20)

PDF
A/B testing powered by Big data - Saurabh Goyal, Booking.com
PDF
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
PDF
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
PDF
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
PDF
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
Privacy by Design - Lars Albertsson, Mapflat
PDF
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
PDF
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
PDF
Enhancing Spark - increase streaming capabilities of your applications - Kami...
PDF
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
PDF
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
PDF
Stream processing with Apache Flink - Maximilian Michels Data Artisans
PDF
Scaling Cassandra in all directions - Jimmy Mardell Spotify
PDF
Big Data for unstructured data Dariusz Śliwa
PDF
Elastic development. Implementing Big Data search Grzegorz Kołpuć
PDF
H2 o deep water making deep learning accessible to everyone -jo-fai chow
PDF
That won’t fit into RAM - Michał Brzezicki
PDF
Stream Analytics with SQL on Apache Flink - Fabian Hueske
PDF
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
A/B testing powered by Big data - Saurabh Goyal, Booking.com
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Privacy by Design - Lars Albertsson, Mapflat
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Big Data for unstructured data Dariusz Śliwa
Elastic development. Implementing Big Data search Grzegorz Kołpuć
H2 o deep water making deep learning accessible to everyone -jo-fai chow
That won’t fit into RAM - Michał Brzezicki
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PPTX
Computer network topology notes for revision
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Business Analytics and business intelligence.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
Foundation of Data Science unit number two notes
Computer network topology notes for revision
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction-to-Cloud-ComputingFinal.pptx
IB Computer Science - Internal Assessment.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to machine learning and Linear Models
Business Analytics and business intelligence.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Acumen Training GuidePresentation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
IBA_Chapter_11_Slides_Final_Accessible.pptx

The Factorization Machines algorithm for building recommendation system - Paweł Łagodziński, SAS Institute

  • 1. FEBRUARY 22, 2018, WARSAW Factorization Machines for building Recommender System Paweł Łagodziński, SAS Institute Poland
  • 2. FEBRUARY 22, 2018, WARSAW Recommender Systems • Primary goal – recommend most relevant items to users Plenty of success stories: • music, movies • retail products • ad verbs • news • query results • social links Users Items Context Recommender System
  • 3. FEBRUARY 22, 2018, WARSAW Recommender Systems • Key challenges • sparse data • extremely large data • cold start • Other issues • different ratings types • difficult performance evolution
  • 4. FEBRUARY 22, 2018, WARSAW Regression models • Linear regression • Polynomial regression (2-d) – input vector with p predictors y – predicted rating – number of parameters O( ҧ𝑝) – learning complexity 1+p – observed rating 1+p+p2 – number of parameters O( ҧ𝑝2 ) – learning complexity
  • 5. FEBRUARY 22, 2018, WARSAW Regression models • Pros • Combines real-value and categorical features • Useful for regression and classification problems • Fast learning for linear model • Cons • Linear regression does not include user-item interactions • Polynomial regression model includes, but only in theory for sparse data • For sparse data pairwise interactions are very rarely observed, in consequences parameters cannot be estimated directly from the data
  • 6. FEBRUARY 22, 2018, WARSAW Laten factor models - matrix factorization 𝑌 ∈ ℝ|𝑈|×|𝐼| matrix of observed ratings i1 i2 i3 i4 u1 2 3 1 ? u2 ? 5 ? 1 u3 ? ? 4 1 u4 2 3 ? 2 I - items U-users • If matrix Y is low-rank it can be approximated with ෠𝑌 𝑌 ≈ ෠𝑌 where k is the rank of matrix ෡𝒀 (the size of latent factors vector)෠𝑌 = 𝑊𝐻 𝑇 , W ∈ ℝ 𝑈 ×𝑘, H ∈ ℝ|𝐼|×𝑘 Rating predictions ො𝑦 𝑢𝑖 = ෍ 𝑗=1 𝑘 𝑤 𝑢𝑗 ∙ ℎ𝑗𝑖 =< 𝒘 𝑢 𝒉𝑖 >
  • 7. FEBRUARY 22, 2018, WARSAW Nero JuliusCaesar Cleopatra SleeplessinSeatle PrettyWomen Casablanca 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0 4 0 0 -1 0 0 0 5 0 0 -1 0 0 0 6 0 0 1 0 0 0 7 0 0 -1 0 0 0 ≈ History Romance 1 1 0 2 1 0 3 1 0 4 1 1 5 -1 1 6 -1 1 7 -1 1 Nero JuliusCaesar Cleopatra SleeplessinSeatle PrettyWomen Casablanca History 1 1 1 0 0 0 Romance 0 0 1 1 1 1 Rating: -1- dislike, 0 – neutral, 1 - like History Romance Both x Laten factor models - matrix factorization • correlations -> lower rank matrix • strong correlations -> works for sparse data after C.C. Aggarwal [1] Nero JuliusCaesar Cleopatra SleeplessinSeatle PrettyWomen Casablanca 1 1 1 1 0 0 0 2 1 1 1 0 0 0 3 1 1 1 0 0 0 4 1 1 1 1 1 1 5 -1 -1 -1 1 1 1 6 -1 -1 1 1 1 1 7 -1 -1 -1 1 1 1 • Intuition behind matrix factorization with low (2) rank matrix 𝑊 𝐻 𝑇𝑌 𝑌 − ෠𝑌
  • 8. FEBRUARY 22, 2018, WARSAW Matrix factorization based models • Many specific variants of factorization models were proposed • Examples: • MF • Time Senstive Factorization Models • User-item matrix factorization • SVD++ • Factorized kNN • timeSVD • timeTF • Tensore Factorization Models • Sequential Models • PARAFAC • PITF • FMC • FPMC (after S. Randal, MLConf, 2014)
  • 9. FEBRUARY 22, 2018, WARSAW Matrix factorization based models • Pros • Enable modelling interactions (user-item-…) even if such events are not observed in data • Effective on sparse data • Cons • Learning algorithms are tailored to each specific model • In result, it is hard to adjust model to the problem under consideration
  • 10. FEBRUARY 22, 2018, WARSAW Factorization Machines • Concept generalizing regression and factorization models • Introduced in 2010 (S. Randel [2]) • Very good performance in data science challenges (after S. Randal, MLConf, 2014): • Click prediction - KDDCup 2012: Track 2 (3rd place), Criteon Display Advertising Challenge (1st place) • Rating prediction – Netlix prize, Movielens, KDDCup 2011, EMI Music Data Science Hackathon (2nd place) • Social link prediction (social media) - KDDCup 2012: Track 1 (2nd place) • Recommend given name - ECML/PKDD Discovery Challenge 2013 (1st place on-line track, 2nd place off-line track) • Predicting result of next exam question - KDDCup 2010, Grockit Challenge (1st place) • Prediction of auction sale price –Kaggle, Blue Book for Bulldozers (1st place)
  • 11. FEBRUARY 22, 2018, WARSAW Factorization Machines - model • Factorization Machines (2nd degree) – input vector with p predictors y – observed rating – predicted rating 1+p+kp – number of parameters O( ҧ𝑝𝑘) – learning complexity (lower than ҧ𝑝2 for 2-d polynomial regression) • FM of higher degrees possible but not used in practice • Parameters can be estimated with use of universal optimization methods, Stochastic Gradient Decent for instance
  • 12. FEBRUARY 22, 2018, WARSAW Factorization Machines • Features engineering enables great expressiveness of FM model • Matrix / tensor data can be easily transformed into features vector Input Y i1 i2 i3 i4 u1 2 3 1 ? u2 ? 5 ? 1 u3 ? ? 4 1 User(U) Item (I) U I y 1 u1 i1 2 2 u1 i2 3 3 u1 i3 1 4 u2 i2 5 5 u2 i4 1 6 u3 i3 4 7 u3 i4 1 target y x1 1 0 0 1 0 0 0 2 y1 x2 1 0 0 0 1 0 0 3 y2 x3 1 0 0 0 0 1 0 1 y3 x4 0 1 0 0 1 0 0 5 y4 x5 0 1 0 0 0 0 1 1 y5 x6 0 0 1 0 0 1 0 4 y6 x7 0 0 1 0 0 0 1 1 y7 u1 u2 u3 i1 i2 i3 i4 Users Items x Features vector (one-hot encoding)
  • 13. FEBRUARY 22, 2018, WARSAW Factorization Machines • Applying FM model target y x1 1 0 0 1 0 0 0 2 y1 x2 1 0 0 0 1 0 0 3 y2 x3 1 0 0 0 0 1 0 1 y3 x4 0 1 0 0 1 0 0 5 y4 x5 0 1 0 0 0 0 1 1 y5 x6 0 0 1 0 0 1 0 4 y6 x7 0 0 1 0 0 0 1 1 y7 u1 u2 u3 i1 i2 i3 i4 Users Items x Features vector FM is equal to combination of MF (interaction user-item) and biases for user and item
  • 14. FEBRUARY 22, 2018, WARSAW Factorization Machines • Adding context categorical information gives other MF model FM is equivalent to PITF model, one of extension of MF model ො𝑦 𝑢𝑖𝑐 = 𝑤0 + 𝑤 𝑢 + 𝑤𝑖 + 𝑤𝑐 +< 𝒗 𝑢 𝒗𝑖 >+< 𝒗 𝑢 𝒗 𝑐 >+< 𝒗𝑖 𝒗 𝑐 > target y x1 1 0 0 1 0 0 0 0 0 1 2 y1 x2 1 0 0 0 1 0 0 1 0 0 3 y2 x3 1 0 0 0 0 1 0 1 0 0 1 y3 x4 0 1 0 0 1 0 0 0 1 0 5 y4 x5 0 1 0 0 0 0 1 1 0 0 1 y5 x6 0 0 1 0 0 1 0 0 1 0 4 y6 x7 0 0 1 0 0 0 1 0 0 1 1 y7 u1 u2 u3 i1 i2 i3 i4 c1 c2 c3 Context Features vector x Users Items
  • 15. FEBRUARY 22, 2018, WARSAW Factorization Machines • Factorization Machines offers combination of regression and factorization models • Low rank matrix approximation enables estimation of unobserved interactions • Effective for sparse and extremely sparse data • Flexible through features engineering • One of the most disruptive algorithms of recent years
  • 16. FEBRUARY 22, 2018, WARSAW FactMac – Factorization Machines in SAS Viya
  • 17. FEBRUARY 22, 2018, WARSAW FactMac – Factorization Machines in SAS Viya • 2-nd degree Factorization Machines • Learning with „HOGWILD!” • Hyper-parameters optimization with Grid Search, Genetic Algorithm • Different deployment options • SAS Viya • In-database • In-stream (SAS ESP) • Considered for the next releases • Field-Aware Factorization Machines, FFM [3] – better predictions than FM • Warm-up restart – faster retraining • Other loss functions 2 3 1 1 2 3
  • 18. FEBRUARY 22, 2018, WARSAW Shot Recommender System for NBA Coaches • Application of Factorization Machines in SAS Viya to sport data • Presented during 2016 KDD Large-Scale Sports Analytics [5] • Goal – recommend shot types for players • Data - recorded shots taken during the 2015-2016 NBA games • shots for 420 players • for shot: player, action type, shot zone range, shot zone area, result, and other characteristics. • Model for shot prediction (FM 2-d, k=25, p = 404), log-odds as a target ? Image source: Wikipedia
  • 19. FEBRUARY 22, 2018, WARSAW Shot Recommender System for NBA Coaches • Biases can be very informative
  • 20. FEBRUARY 22, 2018, WARSAW Shot Recommender System for NBA Coaches • Latent factors analysis
  • 21. FEBRUARY 22, 2018, WARSAW Shot Recommender System for NBA Coaches • Shot recommendations (plogit):
  • 22. FEBRUARY 22, 2018, WARSAW Shot Recommender System for NBA Coaches • Demonstration of the concept on NCAA* basketball data • The National Collegiate Athletic Association (NCAA) is a non-profit organization which regulates athletes of 1,281 institutions, conferences, and individuals (source: wikipedia). • Recent news for Kaggle’rs, new competation launched 2 days ago
  • 23. FEBRUARY 22, 2018, WARSAW Where to use Factorization Machines? • Rule of Thumb • design matrix is very sparse (mostly missing data, 90% missing or more) • the cardinality of the nominal variables is very high and each level of the nominal variables has only a few representatives in the data • you have good reason to believe the design matrix has low rank • Findings in [3], seems to confirm above. FFM was working better for CTR prediction but no so good for more dense data (web phishing, census income) • Our (local SAS office) experiments, still in progress, showed that FM even for not very sparse data can produce robust set of baseline predictions. More specialized and tailored models can outperform FM but requires more effort in development • FactMac was tested for SAS Customer Intelligence 360 solution with good results.
  • 24. FEBRUARY 22, 2018, WARSAW Thank You !
  • 25. FEBRUARY 22, 2018, WARSAW References • [1] Charu C. Aggrwal. „Recommender Systems, The Textbook”. Springer. 2016. • [2] Steffen Rendle. “Factorization Machines.” In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). Piscataway, NJ: Institute of Electrical And Electronics Engineers. 2010. • [3] Yuchin Juan et al., „Field-aware Factorization Machines for CTR Prediction”. RecSys '16 Proceedings of the 10th ACM Conference on Recommender Systems. 2016. • [4] Jason Lee et al., „Practical Large-Scale Optimization for Max-Norm Regularization”, Advances in Neural Information Processing Systems 23 (NIPS). 2010. • [5] Reymond E. Wrigt et al. „Shot Recommender System for NBA Coaches”. KDD. 2016