SlideShare a Scribd company logo
Part 3.
              Nonnegative Matrix Factorization ⇔
               K-means and Spectral Clustering




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   100
Nonnegative Matrix Factorization
                               (NMF)
     Data Matrix: n points in p-dim:
                                                                          xi   is an image,
                           X = ( x1 , x2 ,L, xn )                              document,
                                                                               webpage, etc

      Decomposition
      (low-rank approximation)                                 X ≈ FG      T


       Nonnegative Matrices
                                                     X ij ≥ 0, Fij ≥ 0, Gij ≥ 0

               F = ( f1 , f 2 , L, f k )                       G = ( g1 , g 2 ,L, g k )
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                       101
Some historical notes
         • Earlier work by statistics people
         • P. Paatero (1994) Environmetrices
         • Lee and Seung (1999, 2000)
               – Parts of whole (no cancellation)
               – A multiplicative update algorithm




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   102
⎡0.0⎤
                                                                             ⎢ 0.5⎥
                                                                             ⎢ ⎥
                                                                             ⎢0.7⎥
                                                                             ⎢10 ⎥
                                                                             ⎢. ⎥
                                                                             ⎢M ⎥
                                                                             ⎢ ⎥
                                                                             ⎢0.8⎥
                                                                             ⎢0.2⎥
                                                                             ⎢ ⎥
                                                                             ⎣0.0⎦
                                                                          Pixel vector




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                  103
Parts-based perspective
                                                                          X = ( x1 , x2 ,L, xn )




         F = ( f1 , f 2 , L, f k ) G = ( g1 , g 2 ,L, g k )

                                         X ≈ FG              T



PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                      104
Sparsify F to get parts-based picture




       X ≈ FG T F = ( f1 , f 2 ,L, f k )                                  Li, et al, 200; Hoyer 2003




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                                105
Theorem.
                  NMF = kernel K-means clustering



          NMF produces holistic modeling of the data
         Theoretical results and experiments verification

                                                                          (Ding, He, Simon, 2005)


PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                             106
Our Results: NMF = Data Clustering




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   107
Our Results: NMF = Data Clustering




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   108
Theorem: K-means = NMF
         • Reformulate K-means and Kernel K-means
                                                                    T
                                       max Tr ( H WH )
                                  H H = I , H ≥0
                                     T


         • Show equivalence

                                        min           || W − HH ||        T   2
                                 H H = I , H ≥0
                                    T




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding           109
NMF = K-means


           H = arg max Tr ( H WH )                      T

                       H T H = I , H ≥0

                  = arg min [-2Tr ( H WH )]                    T

                       H T H = I , H ≥0

                  = arg min [|| W ||2 -2Tr ( H TWH )]+ || H T H ||2
                       H T H = I , H ≥0

                   = arg min || W − HH ||                       T 2

                       H T H = I , H ≥0

                   ⇒ arg min || W − HH ||                       T 2

                               H ≥0
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   110
Spectral Clustering = NMF
                                                    ⎛ s (Ck ,Cl ) s (Ck ,Cl ) ⎞        s (Ck ,G − Ck )
   Normalized Cut: J Ncut =                 ∑       ⎜
                                                    ⎜ d
                                            < k ,l >⎝      k
                                                                 +
                                                                      dl
                                                                              ⎟=
                                                                              ⎟
                                                                              ⎠
                                                                                   ∑
                                                                                   k
                                                                                              dk
                                           h1 ( D − W )h1
                                            T
                                                              hk ( D − W )hk
                                                               T
                                         =      T
                                                          +L+      T
                                               h1 Dh1             hk Dhk
                                                                       nk
                                                                      }
   Unsigned cluster indicators:                     y k = D1/ 2 (0L 0,1L1,0 L 0)T / || D1/ 2 hk ||
   Re-write:                                               ~                      ~
                        J Ncut ( y1 , L , y k ) = y1 ( I − W ) y1 + L + y k ( I − W ) y k
                                                   T                      T

                                           ~                          ~
                        = Tr (Y ( I − W )Y )
                                  T
                                                                     W = D −1/ 2WD −1/ 2

                                           ~
      Optimize :                  max Tr(Y W Y ), subject to Y T Y = I
                                                    T
                                     Y

                                                                           ~
        Normalized Cut ⇒                                                || W − HH ||
                                                                                 T 2
                                                          min
                                                   H H = I , H ≥0
                                                      T

                                                                                                    (Gu , et al, 2001)
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                                                  111
Advantages of NMF over standard K-means

                                           Soft clustering




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   112
Experiments on Internet Newsgroups
               NG2:      comp.graphics
                                                                    100 articles from each group.
               NG9:      rec.motorcycles                            1000 words
               NG10:     rec.sport.baseball                         Tf.idf weight. Cosine similarity
               NG15:     sci.space
               NG18:     talk.politics.mideast

                 cosine similarity                         Accuracy of clustering results

                                                                   K-means             W=HH’
                                                                   0.531               0.612
                                                                   0.491               0.590
                                                                   0.576               0.608
                                                                   0.632               0.652
                                                                   0.697               0.711
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                                113
Summary for Symmetric NMF
         • K-means , Kernel K-means
         • Spectral clustering
                                                                    T
                                       max Tr ( H WH )
                                  H H = I , H ≥0
                                     T



         • Equivalence to
                                        min           || W − HH ||        T   2
                                 H H = I , H ≥0
                                    T




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding           114
Nonsymmetric NMF
         • K-means , Kernel K-means
         • Spectral clustering
                                                                    T
                                       max Tr ( H WH )
                                  H H = I , H ≥0
                                     T



         • Equivalence to
                                        min           || W − HH ||        T   2
                                 H H = I , H ≥0
                                    T




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding           115
Non-symmetric NMF
                      Rectangular Data Matrix
                          Bipartite Graph

         •   Information Retrieval: word-to-document
         •   DNA gene expressions
         •   Image pixels
         •   Supermarket transaction data

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   116
K-means Clustering of Bipartite Graphs

         Simultaneous clustering of rows and columns
                                                             row          ⎡1    0⎤
                                                                          ⎢1    0⎥
                                                          indicators      ⎢      ⎥ = ( f1 , f 2 , f3 ) = F
                                                                          ⎢0    1⎥
                                                                          ⎢      ⎥
                                                                          ⎣0    1⎦

                                                                          ⎡1    0⎤
                                                           column         ⎢0
                                                                          ⎢     1⎥
                                                                                 ⎥ = ( g1 , g 2 , g3 ) = G
                                                         indicators
                                 cut                                      ⎢1    0⎥
                                                                          ⎢      ⎥
                                                                          ⎣0    1⎦

                                     k      s ( BR j ,C j )
                    J Kmeans = ∑                              = Tr ( F T BG )
                                     j =1   | R j || C j |
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                                 117
NMF = K-means clustering

             H = arg max Tr ( F BG )                    T

                        F T F = I , F ≥0
                          GT G = I ,G ≥0


                = arg min Tr (−2 F BG )                     T

                     F T F = I , F ≥0
                       GT G = I ,G ≥0

                = arg min Tr (|| B || −2 F BG + F FG G )    2             T   T   T

                     F T F = I , F ≥0
                       GT G = I ,G ≥0

                 = arg min ||B − FG ||                          T   2

                      F T F = I , F ≥0
                        GT G = I ,G ≥0
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding               118
Solving NMF with non-negative least square

                             J =|| X − FGT ||2 , F ≥ 0, G ≥ 0

                  Fix F, solve for G; Fix G, solve for F
                                                          ~
                                                        ⎛ g1 ⎞
                                  n
                                            ~ ||2 , G = ⎜ M ⎟
                            J = ∑ || xi − F gi          ⎜ ⎟
                                i =1
                                                        ⎜g ⎟
                                                          ~
                                                        ⎝ n⎠
                   Iterate, converge to a local minima

PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   119
Solving NMF with multiplicative updating

                               J =|| X − FGT ||2 , F ≥ 0, G ≥ 0

                  Fix F, solve for G; Fix G, solve for F

                  Lee & Seung ( 2000) propose


                                ( XG )ik                                     ( X T F ) jk
                    Fik ← Fik                                  G jk ← G jk
                              ( FGT G )ik                                    (GF T F ) jk


PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding                     120
Symmetric NMF

                             J =|| W − HH T ||2 , H ≥ 0

           Constraint Optimization. KKT 1st condition

           Complementarity slackness condition
                           ⎛ ∂J ⎞
                         0=⎜    ⎟   H ik = (−4WH + 4 HH T H )ik H ik
                           ⎝ ∂H ⎠ik
           Gradient decent
                                  ∂J                      H ik
              H ik ← H ik − ε ik               ε ik =
                                 ∂H ik                4( HH T H )ik
                                             (WH )ik
                    H ik ← H ik (1 − β + β     T
                                                      )
                                           ( HH H )ik
PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   121
Summary


           •   NMF is a new low-rank approximation
           •   The holistic picture (vs. parts-based)
           •   NMF is equivalent to spectral clustering
           •   Main advantage: soft clustering




PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding   122

More Related Content

PDF
YSC 2013
PDF
Random Matrix Theory and Machine Learning - Part 1
PDF
Neural Processes
PDF
Neural Processes Family
PDF
ABC: How Bayesian can it be?
PPT
Convex Optimization Modelling with CVXOPT
PDF
Hyperparameter optimization with approximate gradient
PDF
ma112011id535
YSC 2013
Random Matrix Theory and Machine Learning - Part 1
Neural Processes
Neural Processes Family
ABC: How Bayesian can it be?
Convex Optimization Modelling with CVXOPT
Hyperparameter optimization with approximate gradient
ma112011id535

What's hot (20)

PDF
ABC-Xian
PDF
Conditional neural processes
PDF
Mesh Processing Course : Active Contours
PDF
Bachelor_Defense
PDF
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
PDF
Continuous and Discrete-Time Analysis of SGD
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Efficient end-to-end learning for quantizable representations
PDF
Learning Sparse Representation
PDF
Low Complexity Regularization of Inverse Problems
PPT
Dynamic programming
PDF
Comparing estimation algorithms for block clustering models
PDF
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
PDF
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
PDF
Gaussian Processes: Applications in Machine Learning
PDF
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PDF
Proximal Splitting and Optimal Transport
PDF
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
PDF
Bayesian Dark Knowledge and Matrix Factorization
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
ABC-Xian
Conditional neural processes
Mesh Processing Course : Active Contours
Bachelor_Defense
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Continuous and Discrete-Time Analysis of SGD
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Efficient end-to-end learning for quantizable representations
Learning Sparse Representation
Low Complexity Regularization of Inverse Problems
Dynamic programming
Comparing estimation algorithms for block clustering models
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Gaussian Processes: Applications in Machine Learning
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
Proximal Splitting and Optimal Transport
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Bayesian Dark Knowledge and Matrix Factorization
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Ad

Similar to Principal component analysis and matrix factorizations for learning (part 3) ding - icml tutorial 2005 - 2005 (20)

PDF
Automatic bayesian cubature
PDF
SIAM - Minisymposium on Guaranteed numerical algorithms
PDF
Random Matrix Theory and Machine Learning - Part 4
PDF
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片
PDF
Fit Main
PDF
Unbiased Hamiltonian Monte Carlo
PDF
Introduction to Big Data Science
PDF
Matrix Computations in Machine Learning
PDF
Principal component analysis and matrix factorizations for learning (part 2) ...
PDF
Fuzzy inventory model with shortages in man power planning
PDF
NIPS2010: optimization algorithms in machine learning
PDF
Unbiased Bayes for Big Data
PDF
Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)
PDF
Codes and Isogenies
PDF
Bayesian Deep Learning
PDF
Unbiased Markov chain Monte Carlo
PDF
MCQMC_talk_Chiheb_Ben_hammouda.pdf
PDF
2002 santiago et al
PDF
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
PDF
Hierarchical matrices for approximating large covariance matries and computin...
Automatic bayesian cubature
SIAM - Minisymposium on Guaranteed numerical algorithms
Random Matrix Theory and Machine Learning - Part 4
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片
Fit Main
Unbiased Hamiltonian Monte Carlo
Introduction to Big Data Science
Matrix Computations in Machine Learning
Principal component analysis and matrix factorizations for learning (part 2) ...
Fuzzy inventory model with shortages in man power planning
NIPS2010: optimization algorithms in machine learning
Unbiased Bayes for Big Data
Insufficient Gibbs sampling (A. Luciano, C.P. Robert and R. Ryder)
Codes and Isogenies
Bayesian Deep Learning
Unbiased Markov chain Monte Carlo
MCQMC_talk_Chiheb_Ben_hammouda.pdf
2002 santiago et al
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical matrices for approximating large covariance matries and computin...
Ad

More from zukun (20)

PDF
My lyn tutorial 2009
PDF
ETHZ CV2012: Tutorial openCV
PDF
ETHZ CV2012: Information
PDF
Siwei lyu: natural image statistics
PDF
Lecture9 camera calibration
PDF
Brunelli 2008: template matching techniques in computer vision
PDF
Modern features-part-4-evaluation
PDF
Modern features-part-3-software
PDF
Modern features-part-2-descriptors
PDF
Modern features-part-1-detectors
PDF
Modern features-part-0-intro
PDF
Lecture 02 internet video search
PDF
Lecture 01 internet video search
PDF
Lecture 03 internet video search
PDF
Icml2012 tutorial representation_learning
PPT
Advances in discrete energy minimisation for computer vision
PDF
Gephi tutorial: quick start
PDF
EM algorithm and its application in probabilistic latent semantic analysis
PDF
Object recognition with pictorial structures
PDF
Iccv2011 learning spatiotemporal graphs of human activities
My lyn tutorial 2009
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Information
Siwei lyu: natural image statistics
Lecture9 camera calibration
Brunelli 2008: template matching techniques in computer vision
Modern features-part-4-evaluation
Modern features-part-3-software
Modern features-part-2-descriptors
Modern features-part-1-detectors
Modern features-part-0-intro
Lecture 02 internet video search
Lecture 01 internet video search
Lecture 03 internet video search
Icml2012 tutorial representation_learning
Advances in discrete energy minimisation for computer vision
Gephi tutorial: quick start
EM algorithm and its application in probabilistic latent semantic analysis
Object recognition with pictorial structures
Iccv2011 learning spatiotemporal graphs of human activities

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Dropbox Q2 2025 Financial Results & Investor Presentation
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Principal component analysis and matrix factorizations for learning (part 3) ding - icml tutorial 2005 - 2005

  • 1. Part 3. Nonnegative Matrix Factorization ⇔ K-means and Spectral Clustering PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 100
  • 2. Nonnegative Matrix Factorization (NMF) Data Matrix: n points in p-dim: xi is an image, X = ( x1 , x2 ,L, xn ) document, webpage, etc Decomposition (low-rank approximation) X ≈ FG T Nonnegative Matrices X ij ≥ 0, Fij ≥ 0, Gij ≥ 0 F = ( f1 , f 2 , L, f k ) G = ( g1 , g 2 ,L, g k ) PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 101
  • 3. Some historical notes • Earlier work by statistics people • P. Paatero (1994) Environmetrices • Lee and Seung (1999, 2000) – Parts of whole (no cancellation) – A multiplicative update algorithm PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 102
  • 4. ⎡0.0⎤ ⎢ 0.5⎥ ⎢ ⎥ ⎢0.7⎥ ⎢10 ⎥ ⎢. ⎥ ⎢M ⎥ ⎢ ⎥ ⎢0.8⎥ ⎢0.2⎥ ⎢ ⎥ ⎣0.0⎦ Pixel vector PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 103
  • 5. Parts-based perspective X = ( x1 , x2 ,L, xn ) F = ( f1 , f 2 , L, f k ) G = ( g1 , g 2 ,L, g k ) X ≈ FG T PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 104
  • 6. Sparsify F to get parts-based picture X ≈ FG T F = ( f1 , f 2 ,L, f k ) Li, et al, 200; Hoyer 2003 PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 105
  • 7. Theorem. NMF = kernel K-means clustering NMF produces holistic modeling of the data Theoretical results and experiments verification (Ding, He, Simon, 2005) PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 106
  • 8. Our Results: NMF = Data Clustering PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 107
  • 9. Our Results: NMF = Data Clustering PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 108
  • 10. Theorem: K-means = NMF • Reformulate K-means and Kernel K-means T max Tr ( H WH ) H H = I , H ≥0 T • Show equivalence min || W − HH || T 2 H H = I , H ≥0 T PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 109
  • 11. NMF = K-means H = arg max Tr ( H WH ) T H T H = I , H ≥0 = arg min [-2Tr ( H WH )] T H T H = I , H ≥0 = arg min [|| W ||2 -2Tr ( H TWH )]+ || H T H ||2 H T H = I , H ≥0 = arg min || W − HH || T 2 H T H = I , H ≥0 ⇒ arg min || W − HH || T 2 H ≥0 PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 110
  • 12. Spectral Clustering = NMF ⎛ s (Ck ,Cl ) s (Ck ,Cl ) ⎞ s (Ck ,G − Ck ) Normalized Cut: J Ncut = ∑ ⎜ ⎜ d < k ,l >⎝ k + dl ⎟= ⎟ ⎠ ∑ k dk h1 ( D − W )h1 T hk ( D − W )hk T = T +L+ T h1 Dh1 hk Dhk nk } Unsigned cluster indicators: y k = D1/ 2 (0L 0,1L1,0 L 0)T / || D1/ 2 hk || Re-write: ~ ~ J Ncut ( y1 , L , y k ) = y1 ( I − W ) y1 + L + y k ( I − W ) y k T T ~ ~ = Tr (Y ( I − W )Y ) T W = D −1/ 2WD −1/ 2 ~ Optimize : max Tr(Y W Y ), subject to Y T Y = I T Y ~ Normalized Cut ⇒ || W − HH || T 2 min H H = I , H ≥0 T (Gu , et al, 2001) PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 111
  • 13. Advantages of NMF over standard K-means Soft clustering PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 112
  • 14. Experiments on Internet Newsgroups NG2: comp.graphics 100 articles from each group. NG9: rec.motorcycles 1000 words NG10: rec.sport.baseball Tf.idf weight. Cosine similarity NG15: sci.space NG18: talk.politics.mideast cosine similarity Accuracy of clustering results K-means W=HH’ 0.531 0.612 0.491 0.590 0.576 0.608 0.632 0.652 0.697 0.711 PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 113
  • 15. Summary for Symmetric NMF • K-means , Kernel K-means • Spectral clustering T max Tr ( H WH ) H H = I , H ≥0 T • Equivalence to min || W − HH || T 2 H H = I , H ≥0 T PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 114
  • 16. Nonsymmetric NMF • K-means , Kernel K-means • Spectral clustering T max Tr ( H WH ) H H = I , H ≥0 T • Equivalence to min || W − HH || T 2 H H = I , H ≥0 T PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 115
  • 17. Non-symmetric NMF Rectangular Data Matrix Bipartite Graph • Information Retrieval: word-to-document • DNA gene expressions • Image pixels • Supermarket transaction data PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 116
  • 18. K-means Clustering of Bipartite Graphs Simultaneous clustering of rows and columns row ⎡1 0⎤ ⎢1 0⎥ indicators ⎢ ⎥ = ( f1 , f 2 , f3 ) = F ⎢0 1⎥ ⎢ ⎥ ⎣0 1⎦ ⎡1 0⎤ column ⎢0 ⎢ 1⎥ ⎥ = ( g1 , g 2 , g3 ) = G indicators cut ⎢1 0⎥ ⎢ ⎥ ⎣0 1⎦ k s ( BR j ,C j ) J Kmeans = ∑ = Tr ( F T BG ) j =1 | R j || C j | PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 117
  • 19. NMF = K-means clustering H = arg max Tr ( F BG ) T F T F = I , F ≥0 GT G = I ,G ≥0 = arg min Tr (−2 F BG ) T F T F = I , F ≥0 GT G = I ,G ≥0 = arg min Tr (|| B || −2 F BG + F FG G ) 2 T T T F T F = I , F ≥0 GT G = I ,G ≥0 = arg min ||B − FG || T 2 F T F = I , F ≥0 GT G = I ,G ≥0 PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 118
  • 20. Solving NMF with non-negative least square J =|| X − FGT ||2 , F ≥ 0, G ≥ 0 Fix F, solve for G; Fix G, solve for F ~ ⎛ g1 ⎞ n ~ ||2 , G = ⎜ M ⎟ J = ∑ || xi − F gi ⎜ ⎟ i =1 ⎜g ⎟ ~ ⎝ n⎠ Iterate, converge to a local minima PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 119
  • 21. Solving NMF with multiplicative updating J =|| X − FGT ||2 , F ≥ 0, G ≥ 0 Fix F, solve for G; Fix G, solve for F Lee & Seung ( 2000) propose ( XG )ik ( X T F ) jk Fik ← Fik G jk ← G jk ( FGT G )ik (GF T F ) jk PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 120
  • 22. Symmetric NMF J =|| W − HH T ||2 , H ≥ 0 Constraint Optimization. KKT 1st condition Complementarity slackness condition ⎛ ∂J ⎞ 0=⎜ ⎟ H ik = (−4WH + 4 HH T H )ik H ik ⎝ ∂H ⎠ik Gradient decent ∂J H ik H ik ← H ik − ε ik ε ik = ∂H ik 4( HH T H )ik (WH )ik H ik ← H ik (1 − β + β T ) ( HH H )ik PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 121
  • 23. Summary • NMF is a new low-rank approximation • The holistic picture (vs. parts-based) • NMF is equivalent to spectral clustering • Main advantage: soft clustering PCA & Matrix Factorization for Learning, ICML 2005 Tutorial, Chris Ding 122