@@ -647,21 +647,20 @@ components with some sparsity:
647
647
Non-negative matrix factorization (NMF or NNMF)
648
648
===============================================
649
649
650
+ NMF with the Frobenius norm
651
+ ---------------------------
652
+
650
653
:class: `NMF ` is an alternative approach to decomposition that assumes that the
651
654
data and the components are non-negative. :class: `NMF ` can be plugged in
652
655
instead of :class: `PCA ` or its variants, in the cases where the data matrix
653
- does not contain negative values.
654
- It finds a decomposition of samples :math: `X`
655
- into two matrices :math: `W` and :math: `H` of non-negative elements,
656
- by optimizing for the squared Frobenius norm:
656
+ does not contain negative values. It finds a decomposition of samples
657
+ :math: `X` into two matrices :math: `W` and :math: `H` of non-negative elements,
658
+ by optimizing the distance :math: `d` between :math: `X` and the matrix product
659
+ :math: `WH`. The most widely used distance function is the squared Frobenius
660
+ norm, which is an obvious extension of the Euclidean norm to matrices:
657
661
658
662
.. math ::
659
- \arg \min _{W,H} \frac {1 }{2 } ||X - WH||_{Fro}^2 = \frac {1 }{2 } \sum _{i,j} (X_{ij} - {WH}_{ij})^2
660
-
661
- This norm is an obvious extension of the Euclidean norm to matrices. (Other
662
- optimization objectives have been suggested in the NMF literature, in
663
- particular Kullback-Leibler divergence, but these are not currently
664
- implemented.)
663
+ d_{Fro}(X, Y) = \frac {1 }{2 } ||X - Y||_{Fro}^2 = \frac {1 }{2 } \sum _{i,j} (X_{ij} - {Y}_{ij})^2
665
664
666
665
Unlike :class: `PCA `, the representation of a vector is obtained in an additive
667
666
fashion, by superimposing the components, without subtracting. Such additive
@@ -715,7 +714,7 @@ and the intensity of the regularization with the :attr:`alpha`
715
714
and the regularized objective function is:
716
715
717
716
.. math ::
718
- \frac { 1 }{ 2 }||X - WH||_{Fro}^ 2
717
+ d_{Fro}(X, WH)
719
718
+ \alpha \rho ||W||_1 + \alpha \rho ||H||_1
720
719
+ \frac {\alpha (1 -\rho )}{2 } ||W||_{Fro} ^ 2
721
720
+ \frac {\alpha (1 -\rho )}{2 } ||H||_{Fro} ^ 2
@@ -724,10 +723,57 @@ and the regularized objective function is:
724
723
:func: `non_negative_factorization ` allows a finer control through the
725
724
:attr: `regularization ` attribute, and may regularize only W, only H, or both.
726
725
726
+ NMF with a beta-divergence
727
+ --------------------------
728
+
729
+ As described previously, the most widely used distance function is the squared
730
+ Frobenius norm, which is an obvious extension of the Euclidean norm to
731
+ matrices:
732
+
733
+ .. math ::
734
+ d_{Fro}(X, Y) = \frac {1 }{2 } ||X - Y||_{Fro}^2 = \frac {1 }{2 } \sum _{i,j} (X_{ij} - {Y}_{ij})^2
735
+
736
+ However, NMF can also be used with a different function to measure the
737
+ distance between X and the matrix product WH. Another typical distance
738
+ function used in NMF is the (generalized) Kullback-Leibler (KL) divergence,
739
+ also referred as I-divergence:
740
+
741
+ .. math ::
742
+ d_{KL}(X, Y) = \sum _{i,j} (X_{ij} * log(\frac {X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})
743
+
744
+ Another (less) commonly used distance is the Itakura-Saito (IS) divergence:
745
+
746
+ .. math ::
747
+ d_{IS}(X, Y) = \sum _{i,j} (\frac {X_{ij}}{Y_{ij}} - log(\frac {X_{ij}}{Y_{ij}}) - 1 )
748
+
749
+ These three distances are special cases of the beta-divergence family, with
750
+ :math: `\beta = 2 , 1 , 0 ` respectively [Fevotte, 2011]. The beta-divergence are
751
+ defined by :
752
+
753
+ .. math ::
754
+ d_{\beta }(X, Y) = \sum _{i,j} \frac {1 }{\beta (\beta - 1 )}(X_{ij}^\beta + (\beta -1 )Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1 })
755
+
756
+ Note that this definition is not valid if :math: `\beta \in (0 ; 1 )`, yet it can be
757
+ continously extended to the definitions of :math: `d_{KL}` and :math: `d_{IS}`
758
+ respectively.
759
+
760
+ :class: `NMF ` implements three solvers, using Projected Gradient ('pg')
761
+ (deprecated, it will be removed in 0.19), Coordinate Descent ('cd'), and
762
+ Multiplicative Update ('mu'). The 'mu' solver can optimize every
763
+ beta-divergence, including of course the Frobenius norm (:math: `\beta =2 `), the
764
+ (generalized) Kullback-Leibler divergence (:math: `\beta =1 `) and the
765
+ Itakura-Saito divergence (:math: `\beta =0 `). Note that for
766
+ :math: `\beta \in (1 ; 2 )`, the 'mu' solver is significantly faster.
767
+
768
+ The 'cd' and 'pg' solvers can only optimize the Frobenius norm. Due to the
769
+ underlying non-convexity of NMF, the different solvers may converge to
770
+ different minima, even when optimizing the same distance function.
771
+
727
772
.. topic :: Examples:
728
773
729
774
* :ref: `example_decomposition_plot_faces_decomposition.py `
730
775
* :ref: `example_applications_topics_extraction_with_nmf_lda.py `
776
+ * :ref: `example_decomposition_plot_beta_divergence.py `
731
777
732
778
.. topic :: References:
733
779
@@ -753,6 +799,10 @@ and the regularized objective function is:
753
799
<http://www.bsp.brain.riken.jp/publications/2009/Cichocki-Phan-IEICE_col.pdf> `_
754
800
A. Cichocki, P. Anh-Huy, 2009
755
801
802
+ * `"Algorithms for nonnegative matrix factorization with the beta-divergence"
803
+ <http://http://arxiv.org/pdf/1010.1763v3.pdf> `_
804
+ C. Fevotte, J. Idier, 2011
805
+
756
806
757
807
.. _LatentDirichletAllocation :
758
808
0 commit comments