SlideShare a Scribd company logo
2
Most read
13
Most read
14
Most read
Non-Negative Matrix
Factorization
A quick tutorial
Matrices (also Matrixes)
In mathematics, a matrix (plural matrices) is a
rectangular array of numbers arranged in rows
and columns. The individual items in a matrix
are called its elements or entries.
An example of a matrix with 2 rows and 3
columns is:
Source: Wikipedia
Size of Matrices
The size of a matrix is defined by the number of
rows and columns that it contains. A matrix with
m rows and n columns is called an m × n matrix
or m-by-n matrix, while m and n are called its
dimensions.
Size?
Basic Matrix Operations
We’ll discuss some basic matrix operations next, and we’ll
practice by generating random matrices* using a simple
python script that can be found here:
http://bit.ly/mgen-py
*Please applaud instructor level of difficulty and excuse whiteboard arithmetic mistakes.
Adding (and Subtracting) Matrices
For matrices of the same dimension, add together the
corresponding elements to result in another matrix of the
same dimension.
Scalar Multiplication
To multiply a scalar denoted as i, or any real
number against a matrix, A - simply multiply i to
each element of the matrix.
Transposing Matrices
Reflect a matrix along its diagonal by swapping
the matrix’s rows and columns. Such that a m x
n matrix becomes an n x m matrix.
Matrix Multiplication
A matrix product between a matrix A with size n x m and B
with size m x p will produce an n x p matrix in which the m
columns of A are multiplied against the m rows of B as
follows:
The Dot Product of the
row in A and column in B
The Dot Product
The dot product of two vectors (e.g. 1 x m or n
x 1 matrices) of the same size is the sum of the
products of each element at every direction.
Matrix Sparsity (or Density)
The percentage of non-zero elements to the
number of elements in total (and 1 - the ratio of
non-zero elements to the number of elements).
[ 11 22 0 0 0 0 0 ]
[ 0 33 44 0 0 0 0 ]
[ 0 0 55 66 77 0 0 ]
[ 0 0 0 0 0 88 0 ]
[ 0 0 0 0 0 0 99 ]
Matrix Factorization (or Decomposition)
Given a n x m matrix, R - find two smaller
matrices, P and Q with k-dimensional features -
e.g. that P is size n x k and Q is size m x k -
such that their product approximates R.
(Adjusting k allows for more accuracy when
reconstructing the original matrix.)
Note: Sparse Matrix Factorization is used when the matrix is populated primarily by zeros
Recommendations
In order to compute recommendations, we will
construct a users x movies matrix such that
every element of the matrix is the user’s rating,
if any:
Star Wars Bridget Jones The Hobbit ...
Bob 5 2 0
Joe 3 4 2
Jane 0 0 3
... ... ... ...
As you can see - this is a pretty sparse matrix!
Factored Matrices
● P is the features matrix. It has a row for each
feature and a column for each column in the
original matrix (movie).
● Q is the weights matrix. It has a column for
each feature, and a row for each row in the
original matrix (user).
● Therefore when you multiply the dot product,
you’re finding the sum of the latent features
by the weights for each element in the
original matrix.
Latent Features
● The features and weights described before
measure “latent” features - e.g. hidden, non-
human describable features and their
weights, but could relate to things like genre.
● You need less features than items,
otherwise the best answer is simply every
item (no similarity).
● The magic is where there are zero values -
the product will fill them in, predicting their
value.
Non-negative
Called non-negative matrix factorization
because it returns features and weights with no
negative values. Therefore all features must be
positive or zero values.
Clustering
Non-Negative Matrix Factorization is closely
related to both supervised and unsupervised
methodologies (supervised because R can be
seen as a training set) - but in particular NNMF
is closely related to other clustering
(unsupervised) algorithms.
Gradient Descent
● The technique we will use to
factor is called gradient
descent, which attempts to
minimize error.
● We can calculate error from
our product using the
squared error (actual -
predicted)2
● Once we know the error, we
can calculate the gradient in
order to figure out what
direction to go to minimize
the error. We keep going
until we have no more error.
The Algorithm
→ Initialize P and Q with random small numbers
→ for step until max_steps:
for row, col in R:
if R[row][col] > 0:
compute error of element
compute gradient from error
update P and Q with new entry
compute total error
if error < some threshold:
break
return P, Q.T
Needed Computations
Compute* predicted element for each user-
movie pair (dot product of row and column in P
and Q):
Compute the squared error for each user-movie
pair (in order to compute the gradient):
*computations for non-zero values only
Needed Computations
Find gradient (slope of error curve) by taking
the differential of the error for each element:
Update Rule
Update each element in P and Q by using a
learning rate (called α) - this determines how
far to travel along the gradient. α is usually
small, because if we choose a step size that is
too large, we could miss the minimum.
Convergence and Regularization
We converge once the sum of the errors has
reached some threshold, usually very small.
Extending this algorithm will introduce
regularization to avoid overfitting by adding a
beta parameter. This forces the algorithm to
control the magnitudes of the feature vectors.
Predicted Recommendations
Our predicted matrix for our movies rating, will
end up looking something like what follows:
Where there were zeros before, we now have
predictions!
Star Wars Bridget Jones The Hobbit ...
Bob 4.98148768 2.02748447 3.29852779
Joe 3.0157327 3.968359 2.01139212
Jane 4.50410968 2.93580899 2.98826278
... ... ... ...
As you can see - the predictions are very close to the actual values, but we now
suspect that Jane will really like Star Wars!

More Related Content

PPTX
Restricted Boltzmann Machines.pptx
PPT
Multi-Layer Perceptrons
PPTX
Dimensionality reduction: SVD and its applications
PPT
Frequency Domain Image Enhancement Techniques
PDF
Gradient descent method
PPTX
Naive Bayes
PPTX
Lecture 18: Gaussian Mixture Models and Expectation Maximization
PPTX
Support vector machines (svm)
Restricted Boltzmann Machines.pptx
Multi-Layer Perceptrons
Dimensionality reduction: SVD and its applications
Frequency Domain Image Enhancement Techniques
Gradient descent method
Naive Bayes
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Support vector machines (svm)

What's hot (20)

PPTX
Summer Report on Mathematics for Machine learning: Imperial College of London
PDF
Digital Image Processing: Digital Image Fundamentals
PPTX
Convolutional Neural Network and Its Applications
PPTX
Erosion and dilation
PPTX
PPTX
Object detection
PPTX
Image Classification And Support Vector Machine
PPT
Lower bound
PPTX
Brute force method
PDF
Digital Image Fundamentals
PPT
Chapter 5
PDF
Lecture9 - Bayesian-Decision-Theory
ODP
NAIVE BAYES CLASSIFIER
PDF
Dimensionality Reduction
PPTX
Digital Image Fundamentals
PPT
Chapter10 image segmentation
PPTX
Edge detection
PPTX
Svm vs ls svm
PDF
Linear regression
PPT
Texture in image processing
Summer Report on Mathematics for Machine learning: Imperial College of London
Digital Image Processing: Digital Image Fundamentals
Convolutional Neural Network and Its Applications
Erosion and dilation
Object detection
Image Classification And Support Vector Machine
Lower bound
Brute force method
Digital Image Fundamentals
Chapter 5
Lecture9 - Bayesian-Decision-Theory
NAIVE BAYES CLASSIFIER
Dimensionality Reduction
Digital Image Fundamentals
Chapter10 image segmentation
Edge detection
Svm vs ls svm
Linear regression
Texture in image processing
Ad

Viewers also liked (20)

PPTX
Simple Matrix Factorization for Recommendation in Mahout
PDF
Matrix Factorization Techniques For Recommender Systems
PDF
Matrix Factorization In Recommender Systems
PDF
Latent factor models for Collaborative Filtering
PDF
Collaborative Filtering with Spark
PDF
آموزش محاسبات عددی - بخش دوم
PDF
Matrix Factorization Technique for Recommender Systems
PDF
Nonnegative Matrix Factorization
PPT
Matrix factorization
PDF
Neighbor methods vs matrix factorization - case studies of real-life recommen...
PPTX
Factorization Machines with libFM
PPTX
Recommender system introduction
PDF
Intro to Factorization Machines
PPT
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
PPTX
Recommender Systems
PDF
Introduction to Matrix Factorization Methods Collaborative Filtering
PPT
Recommendation system
PDF
Recommender Systems
PPTX
Collaborative Filtering Recommendation System
PDF
Building a Recommendation Engine - An example of a product recommendation engine
Simple Matrix Factorization for Recommendation in Mahout
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization In Recommender Systems
Latent factor models for Collaborative Filtering
Collaborative Filtering with Spark
آموزش محاسبات عددی - بخش دوم
Matrix Factorization Technique for Recommender Systems
Nonnegative Matrix Factorization
Matrix factorization
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Factorization Machines with libFM
Recommender system introduction
Intro to Factorization Machines
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Recommender Systems
Introduction to Matrix Factorization Methods Collaborative Filtering
Recommendation system
Recommender Systems
Collaborative Filtering Recommendation System
Building a Recommendation Engine - An example of a product recommendation engine
Ad

Similar to Beginners Guide to Non-Negative Matrix Factorization (20)

PPTX
Kulum alin-11 jan2014
PDF
Machine Learning Notes for beginners ,Step by step
PPTX
Implement principal component analysis (PCA) in python from scratch
PDF
Principal component analysis and lda
PPT
Numerical Methods
PPTX
Feature selection using PCA.pptx
PDF
MNIST and machine learning - presentation
PPTX
Linear Algebra Presentation including basic of linear Algebra
PPTX
Mat lab.pptx
PDF
Machine Learning.pdf
PDF
Matlab for marketing people
PDF
Markov Cluster Algorithm & real world application
PDF
working with python
PPTX
SN-BDA-MR-Analysis-6.pptx.................
PDF
CAPS_Intro_PPT computer methods for elec
PPTX
Arrays with Numpy, Computer Graphics
PPTX
Fundamentals of Machine Learning.pptx
PPT
Matrix and its applications by mohammad imran
PPTX
Science..........................................a
PPTX
PCA Algorithmthatincludespcathatispca.pptx
Kulum alin-11 jan2014
Machine Learning Notes for beginners ,Step by step
Implement principal component analysis (PCA) in python from scratch
Principal component analysis and lda
Numerical Methods
Feature selection using PCA.pptx
MNIST and machine learning - presentation
Linear Algebra Presentation including basic of linear Algebra
Mat lab.pptx
Machine Learning.pdf
Matlab for marketing people
Markov Cluster Algorithm & real world application
working with python
SN-BDA-MR-Analysis-6.pptx.................
CAPS_Intro_PPT computer methods for elec
Arrays with Numpy, Computer Graphics
Fundamentals of Machine Learning.pptx
Matrix and its applications by mohammad imran
Science..........................................a
PCA Algorithmthatincludespcathatispca.pptx

More from Benjamin Bengfort (20)

PDF
Privacy and Security in the Age of Generative AI - C4AI.pdf
PDF
Implementing Function Calling LLMs without Fear.pdf
PDF
Privacy and Security in the Age of Generative AI
PDF
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
PDF
Getting Started with TRISA
PDF
Visual diagnostics for more effective machine learning
PDF
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
PDF
Dynamics in graph analysis (PyData Carolinas 2016)
PDF
Visualizing the Model Selection Process
PDF
Data Product Architectures
PDF
A Primer on Entity Resolution
PDF
An Interactive Visual Analytics Dashboard for the Employment Situation Report
PPTX
Graph Based Machine Learning on Relational Data
PDF
Introduction to Machine Learning with SciKit-Learn
PDF
Fast Data Analytics with Spark and Python
PDF
Evolutionary Design of Swarms (SSCI 2014)
PDF
An Overview of Spanner: Google's Globally Distributed Database
PDF
Graph Analyses with Python and NetworkX
PDF
Natural Language Processing with Python
PDF
Building Data Products with Python (Georgetown)
Privacy and Security in the Age of Generative AI - C4AI.pdf
Implementing Function Calling LLMs without Fear.pdf
Privacy and Security in the Age of Generative AI
Digitocracy without Borders: the unifying and destabilizing effects of softwa...
Getting Started with TRISA
Visual diagnostics for more effective machine learning
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Dynamics in graph analysis (PyData Carolinas 2016)
Visualizing the Model Selection Process
Data Product Architectures
A Primer on Entity Resolution
An Interactive Visual Analytics Dashboard for the Employment Situation Report
Graph Based Machine Learning on Relational Data
Introduction to Machine Learning with SciKit-Learn
Fast Data Analytics with Spark and Python
Evolutionary Design of Swarms (SSCI 2014)
An Overview of Spanner: Google's Globally Distributed Database
Graph Analyses with Python and NetworkX
Natural Language Processing with Python
Building Data Products with Python (Georgetown)

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Machine learning based COVID-19 study performance prediction
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Modernizing your data center with Dell and AMD
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Diabetes mellitus diagnosis method based random forest with bat algorithm
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
Understanding_Digital_Forensics_Presentation.pptx
Modernizing your data center with Dell and AMD

Beginners Guide to Non-Negative Matrix Factorization

  • 2. Matrices (also Matrixes) In mathematics, a matrix (plural matrices) is a rectangular array of numbers arranged in rows and columns. The individual items in a matrix are called its elements or entries. An example of a matrix with 2 rows and 3 columns is: Source: Wikipedia
  • 3. Size of Matrices The size of a matrix is defined by the number of rows and columns that it contains. A matrix with m rows and n columns is called an m × n matrix or m-by-n matrix, while m and n are called its dimensions. Size?
  • 4. Basic Matrix Operations We’ll discuss some basic matrix operations next, and we’ll practice by generating random matrices* using a simple python script that can be found here: http://bit.ly/mgen-py *Please applaud instructor level of difficulty and excuse whiteboard arithmetic mistakes.
  • 5. Adding (and Subtracting) Matrices For matrices of the same dimension, add together the corresponding elements to result in another matrix of the same dimension.
  • 6. Scalar Multiplication To multiply a scalar denoted as i, or any real number against a matrix, A - simply multiply i to each element of the matrix.
  • 7. Transposing Matrices Reflect a matrix along its diagonal by swapping the matrix’s rows and columns. Such that a m x n matrix becomes an n x m matrix.
  • 8. Matrix Multiplication A matrix product between a matrix A with size n x m and B with size m x p will produce an n x p matrix in which the m columns of A are multiplied against the m rows of B as follows: The Dot Product of the row in A and column in B
  • 9. The Dot Product The dot product of two vectors (e.g. 1 x m or n x 1 matrices) of the same size is the sum of the products of each element at every direction.
  • 10. Matrix Sparsity (or Density) The percentage of non-zero elements to the number of elements in total (and 1 - the ratio of non-zero elements to the number of elements). [ 11 22 0 0 0 0 0 ] [ 0 33 44 0 0 0 0 ] [ 0 0 55 66 77 0 0 ] [ 0 0 0 0 0 88 0 ] [ 0 0 0 0 0 0 99 ]
  • 11. Matrix Factorization (or Decomposition) Given a n x m matrix, R - find two smaller matrices, P and Q with k-dimensional features - e.g. that P is size n x k and Q is size m x k - such that their product approximates R. (Adjusting k allows for more accuracy when reconstructing the original matrix.) Note: Sparse Matrix Factorization is used when the matrix is populated primarily by zeros
  • 12. Recommendations In order to compute recommendations, we will construct a users x movies matrix such that every element of the matrix is the user’s rating, if any: Star Wars Bridget Jones The Hobbit ... Bob 5 2 0 Joe 3 4 2 Jane 0 0 3 ... ... ... ... As you can see - this is a pretty sparse matrix!
  • 13. Factored Matrices ● P is the features matrix. It has a row for each feature and a column for each column in the original matrix (movie). ● Q is the weights matrix. It has a column for each feature, and a row for each row in the original matrix (user). ● Therefore when you multiply the dot product, you’re finding the sum of the latent features by the weights for each element in the original matrix.
  • 14. Latent Features ● The features and weights described before measure “latent” features - e.g. hidden, non- human describable features and their weights, but could relate to things like genre. ● You need less features than items, otherwise the best answer is simply every item (no similarity). ● The magic is where there are zero values - the product will fill them in, predicting their value.
  • 15. Non-negative Called non-negative matrix factorization because it returns features and weights with no negative values. Therefore all features must be positive or zero values.
  • 16. Clustering Non-Negative Matrix Factorization is closely related to both supervised and unsupervised methodologies (supervised because R can be seen as a training set) - but in particular NNMF is closely related to other clustering (unsupervised) algorithms.
  • 17. Gradient Descent ● The technique we will use to factor is called gradient descent, which attempts to minimize error. ● We can calculate error from our product using the squared error (actual - predicted)2 ● Once we know the error, we can calculate the gradient in order to figure out what direction to go to minimize the error. We keep going until we have no more error.
  • 18. The Algorithm → Initialize P and Q with random small numbers → for step until max_steps: for row, col in R: if R[row][col] > 0: compute error of element compute gradient from error update P and Q with new entry compute total error if error < some threshold: break return P, Q.T
  • 19. Needed Computations Compute* predicted element for each user- movie pair (dot product of row and column in P and Q): Compute the squared error for each user-movie pair (in order to compute the gradient): *computations for non-zero values only
  • 20. Needed Computations Find gradient (slope of error curve) by taking the differential of the error for each element:
  • 21. Update Rule Update each element in P and Q by using a learning rate (called α) - this determines how far to travel along the gradient. α is usually small, because if we choose a step size that is too large, we could miss the minimum.
  • 22. Convergence and Regularization We converge once the sum of the errors has reached some threshold, usually very small. Extending this algorithm will introduce regularization to avoid overfitting by adding a beta parameter. This forces the algorithm to control the magnitudes of the feature vectors.
  • 23. Predicted Recommendations Our predicted matrix for our movies rating, will end up looking something like what follows: Where there were zeros before, we now have predictions! Star Wars Bridget Jones The Hobbit ... Bob 4.98148768 2.02748447 3.29852779 Joe 3.0157327 3.968359 2.01139212 Jane 4.50410968 2.93580899 2.98826278 ... ... ... ... As you can see - the predictions are very close to the actual values, but we now suspect that Jane will really like Star Wars!