@@ -648,27 +648,26 @@ components with some sparsity:
648
648
Non-negative matrix factorization (NMF or NNMF)
649
649
===============================================
650
650
651
- :class: `NMF ` is an alternative approach to decomposition that assumes that the
651
+ NMF with the Frobenius norm
652
+ ---------------------------
653
+
654
+ :class: `NMF ` [1 ]_ is an alternative approach to decomposition that assumes that the
652
655
data and the components are non-negative. :class: `NMF ` can be plugged in
653
656
instead of :class: `PCA ` or its variants, in the cases where the data matrix
654
- does not contain negative values.
655
- It finds a decomposition of samples :math: `X`
656
- into two matrices :math: `W` and :math: `H` of non-negative elements,
657
- by optimizing for the squared Frobenius norm:
657
+ does not contain negative values. It finds a decomposition of samples
658
+ :math: `X` into two matrices :math: `W` and :math: `H` of non-negative elements,
659
+ by optimizing the distance :math: `d` between :math: `X` and the matrix product
660
+ :math: `WH`. The most widely used distance function is the squared Frobenius
661
+ norm, which is an obvious extension of the Euclidean norm to matrices:
658
662
659
663
.. math ::
660
- \arg \min _{W,H} \frac {1 }{2 } ||X - WH||_{Fro}^2 = \frac {1 }{2 } \sum _{i,j} (X_{ij} - {WH}_{ij})^2
661
-
662
- This norm is an obvious extension of the Euclidean norm to matrices. (Other
663
- optimization objectives have been suggested in the NMF literature, in
664
- particular Kullback-Leibler divergence, but these are not currently
665
- implemented.)
664
+ d_{Fro}(X, Y) = \frac {1 }{2 } ||X - Y||_{Fro}^2 = \frac {1 }{2 } \sum _{i,j} (X_{ij} - {Y}_{ij})^2
666
665
667
666
Unlike :class: `PCA `, the representation of a vector is obtained in an additive
668
667
fashion, by superimposing the components, without subtracting. Such additive
669
668
models are efficient for representing images and text.
670
669
671
- It has been observed in [Hoyer, 04] that, when carefully constrained,
670
+ It has been observed in [Hoyer, 2004] [ 2 ]_ that, when carefully constrained,
672
671
:class: `NMF ` can produce a parts-based representation of the dataset,
673
672
resulting in interpretable models. The following example displays 16
674
673
sparse components found by :class: `NMF ` from the images in the Olivetti
@@ -686,8 +685,8 @@ faces dataset, in comparison with the PCA eigenfaces.
686
685
687
686
688
687
The :attr: `init ` attribute determines the initialization method applied, which
689
- has a great impact on the performance of the method. :class: `NMF ` implements
690
- the method Nonnegative Double Singular Value Decomposition. NNDSVD is based on
688
+ has a great impact on the performance of the method. :class: `NMF ` implements the
689
+ method Nonnegative Double Singular Value Decomposition. NNDSVD [ 4 ]_ is based on
691
690
two SVD processes, one approximating the data matrix, the other approximating
692
691
positive sections of the resulting partial SVD factors utilizing an algebraic
693
692
property of unit rank matrices. The basic NNDSVD algorithm is better fit for
@@ -696,6 +695,11 @@ the mean of all elements of the data), and NNDSVDar (in which the zeros are set
696
695
to random perturbations less than the mean of the data divided by 100) are
697
696
recommended in the dense case.
698
697
698
+ Note that the Multiplicative Update ('mu') solver cannot update zeros present in
699
+ the initialization, so it leads to poorer results when used jointly with the
700
+ basic NNDSVD algorithm which introduces a lot of zeros; in this case, NNDSVDa or
701
+ NNDSVDar should be preferred.
702
+
699
703
:class: `NMF ` can also be initialized with correctly scaled random non-negative
700
704
matrices by setting :attr: `init="random" `. An integer seed or a
701
705
``RandomState `` can also be passed to :attr: `random_state ` to control
@@ -716,7 +720,7 @@ and the intensity of the regularization with the :attr:`alpha`
716
720
and the regularized objective function is:
717
721
718
722
.. math ::
719
- \frac { 1 }{ 2 }||X - WH||_{Fro}^ 2
723
+ d_{Fro}(X, WH)
720
724
+ \alpha \rho ||W||_1 + \alpha \rho ||H||_1
721
725
+ \frac {\alpha (1 -\rho )}{2 } ||W||_{Fro} ^ 2
722
726
+ \frac {\alpha (1 -\rho )}{2 } ||H||_{Fro} ^ 2
@@ -725,35 +729,100 @@ and the regularized objective function is:
725
729
:func: `non_negative_factorization ` allows a finer control through the
726
730
:attr: `regularization ` attribute, and may regularize only W, only H, or both.
727
731
732
+ NMF with a beta-divergence
733
+ --------------------------
734
+
735
+ As described previously, the most widely used distance function is the squared
736
+ Frobenius norm, which is an obvious extension of the Euclidean norm to
737
+ matrices:
738
+
739
+ .. math ::
740
+ d_{Fro}(X, Y) = \frac {1 }{2 } ||X - Y||_{Fro}^2 = \frac {1 }{2 } \sum _{i,j} (X_{ij} - {Y}_{ij})^2
741
+
742
+ Other distance functions can be used in NMF as, for example, the (generalized)
743
+ Kullback-Leibler (KL) divergence, also referred as I-divergence:
744
+
745
+ .. math ::
746
+ d_{KL}(X, Y) = \sum _{i,j} (X_{ij} log(\frac {X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})
747
+
748
+ Or, the Itakura-Saito (IS) divergence:
749
+
750
+ .. math ::
751
+ d_{IS}(X, Y) = \sum _{i,j} (\frac {X_{ij}}{Y_{ij}} - log(\frac {X_{ij}}{Y_{ij}}) - 1 )
752
+
753
+ These three distances are special cases of the beta-divergence family, with
754
+ :math: `\beta = 2 , 1 , 0 ` respectively [6 ]_. The beta-divergence are
755
+ defined by :
756
+
757
+ .. math ::
758
+ d_{\beta }(X, Y) = \sum _{i,j} \frac {1 }{\beta (\beta - 1 )}(X_{ij}^\beta + (\beta -1 )Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1 })
759
+
760
+ .. figure :: ../auto_examples/decomposition/images/sphx_glr_plot_beta_divergence_001.png
761
+ :target: ../auto_examples/decomposition/plot_beta_divergence.html
762
+ :align: center
763
+ :scale: 75%
764
+
765
+ Note that this definition is not valid if :math: `\beta \in (0 ; 1 )`, yet it can
766
+ be continously extended to the definitions of :math: `d_{KL}` and :math: `d_{IS}`
767
+ respectively.
768
+
769
+ :class: `NMF ` implements two solvers, using Coordinate Descent ('cd') [5 ]_, and
770
+ Multiplicative Update ('mu') [6 ]_. The 'mu' solver can optimize every
771
+ beta-divergence, including of course the Frobenius norm (:math: `\beta =2 `), the
772
+ (generalized) Kullback-Leibler divergence (:math: `\beta =1 `) and the
773
+ Itakura-Saito divergence (:math: `\beta =0 `). Note that for
774
+ :math: `\beta \in (1 ; 2 )`, the 'mu' solver is significantly faster than for other
775
+ values of :math: `\beta `. Note also that with a negative (or 0, i.e.
776
+ 'itakura-saito') :math: `\beta `, the input matrix cannot contain zero values.
777
+
778
+ The 'cd' solver can only optimize the Frobenius norm. Due to the
779
+ underlying non-convexity of NMF, the different solvers may converge to
780
+ different minima, even when optimizing the same distance function.
781
+
782
+ NMF is best used with the ``fit_transform `` method, which returns the matrix W.
783
+ The matrix H is stored into the fitted model in the ``components_ `` attribute;
784
+ the method ``transform `` will decompose a new matrix X_new based on these
785
+ stored components::
786
+
787
+ >>> import numpy as np
788
+ >>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
789
+ >>> from sklearn.decomposition import NMF
790
+ >>> model = NMF(n_components=2, init='random', random_state=0)
791
+ >>> W = model.fit_transform(X)
792
+ >>> H = model.components_
793
+ >>> X_new = np.array([[1, 0], [1, 6.1], [1, 0], [1, 4], [3.2, 1], [0, 4]])
794
+ >>> W_new = model.transform(X_new)
795
+
728
796
.. topic :: Examples:
729
797
730
798
* :ref: `sphx_glr_auto_examples_decomposition_plot_faces_decomposition.py `
731
799
* :ref: `sphx_glr_auto_examples_applications_topics_extraction_with_nmf_lda.py `
800
+ * :ref: `sphx_glr_auto_examples_decomposition_plot_beta_divergence.py `
732
801
733
802
.. topic :: References:
734
803
735
- * `"Learning the parts of objects by non-negative matrix factorization"
804
+ .. [ 1 ] `"Learning the parts of objects by non-negative matrix factorization"
736
805
<http://www.columbia.edu/~jwp2128/Teaching/W4721/papers/nmf_nature.pdf> `_
737
806
D. Lee, S. Seung, 1999
738
807
739
- * `"Non-negative Matrix Factorization with Sparseness Constraints"
808
+ .. [ 2 ] `"Non-negative Matrix Factorization with Sparseness Constraints"
740
809
<http://www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf> `_
741
810
P. Hoyer, 2004
742
811
743
- * `"Projected gradient methods for non-negative matrix factorization"
744
- <http://www.csie.ntu.edu.tw/~cjlin/nmf/> `_
745
- C.-J. Lin, 2007
746
-
747
- * `"SVD based initialization: A head start for nonnegative
812
+ .. [4 ] `"SVD based initialization: A head start for nonnegative
748
813
matrix factorization"
749
814
<http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf> `_
750
815
C. Boutsidis, E. Gallopoulos, 2008
751
816
752
- * `"Fast local algorithms for large scale nonnegative matrix and tensor
817
+ .. [ 5 ] `"Fast local algorithms for large scale nonnegative matrix and tensor
753
818
factorizations."
754
819
<http://www.bsp.brain.riken.jp/publications/2009/Cichocki-Phan-IEICE_col.pdf> `_
755
820
A. Cichocki, P. Anh-Huy, 2009
756
821
822
+ .. [6 ] `"Algorithms for nonnegative matrix factorization with the beta-divergence"
823
+ <http://http://arxiv.org/pdf/1010.1763v3.pdf> `_
824
+ C. Fevotte, J. Idier, 2011
825
+
757
826
758
827
.. _LatentDirichletAllocation :
759
828
0 commit comments