8000 [MRG+1] Add multiplicative-update solver in NMF, with all beta-diverg… · Sundrique/scikit-learn@d52e939 · GitHub
[go: up one dir, main page]

Skip to content

Commit d52e939

Browse files
TomDLTSundrique
authored andcommitted
[MRG+1] Add multiplicative-update solver in NMF, with all beta-divergence (scikit-learn#5295)
1 parent 42115c5 commit d52e939

File tree

7 files changed

+1092
-328
lines changed

7 files changed

+1092
-328
lines changed

doc/modules/classes.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,6 @@ Samples generator
322322

323323
decomposition.PCA
324324
decomposition.IncrementalPCA
325-
decomposition.ProjectedGradientNMF
326325
decomposition.KernelPCA
327326
decomposition.FactorAnalysis
328327
decomposition.FastICA
@@ -1058,7 +1057,7 @@ See the :ref:`metrics` section of the user guide for further details.
10581057
neighbors.DistanceMetric
10591058
neighbors.KernelDensity
10601059
neighbors.LocalOutlierFactor
1061-
1060+
10621061
.. autosummary::
10631062
:toctree: generated/
10641063
:template: function.rst

doc/modules/decomposition.rst

Lines changed: 92 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -648,27 +648,26 @@ components with some sparsity:
648648
Non-negative matrix factorization (NMF or NNMF)
649649
===============================================
650650

651-
:class:`NMF` is an alternative approach to decomposition that assumes that the
651+
NMF with the Frobenius norm
652+
---------------------------
653+
654+
:class:`NMF` [1]_ is an alternative approach to decomposition that assumes that the
652655
data and the components are non-negative. :class:`NMF` can be plugged in
653656
instead of :class:`PCA` or its variants, in the cases where the data matrix
654-
does not contain negative values.
655-
It finds a decomposition of samples :math:`X`
656-
into two matrices :math:`W` and :math:`H` of non-negative elements,
657-
by optimizing for the squared Frobenius norm:
657+
does not contain negative values. It finds a decomposition of samples
658+
:math:`X` into two matrices :math:`W` and :math:`H` of non-negative elements,
659+
by optimizing the distance :math:`d` between :math:`X` and the matrix product
660+
:math:`WH`. The most widely used distance function is the squared Frobenius
661+
norm, which is an obvious extension of the Euclidean norm to matrices:
658662

659663
.. math::
660-
\arg\min_{W,H} \frac{1}{2} ||X - WH||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {WH}_{ij})^2
661-
662-
This norm is an obvious extension of the Euclidean norm to matrices. (Other
663-
optimization objectives have been suggested in the NMF literature, in
664-
particular Kullback-Leibler divergence, but these are not currently
665-
implemented.)
664+
d_{Fro}(X, Y) = \frac{1}{2} ||X - Y||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2
666665
667666
Unlike :class:`PCA`, the representation of a vector is obtained in an additive
668667
fashion, by superimposing the components, without subtracting. Such additive
669668
models are efficient for representing images and text.
670669

671-
It has been observed in [Hoyer, 04] that, when carefully constrained,
670+
It has been observed in [Hoyer, 2004] [2]_ that, when carefully constrained,
672671
:class:`NMF` can produce a parts-based representation of the dataset,
673672
resulting in interpretable models. The following example displays 16
674673
sparse components found by :class:`NMF` from the images in the Olivetti
@@ -686,8 +685,8 @@ faces dataset, in comparison with the PCA eigenfaces.
686685

687686

688687
The :attr:`init` attribute determines the initialization method applied, which
689-
has a great impact on the performance of the method. :class:`NMF` implements
690-
the method Nonnegative Double Singular Value Decomposition. NNDSVD is based on
688+
has a great impact on the performance of the method. :class:`NMF` implements the
689+
method Nonnegative Double Singular Value Decomposition. NNDSVD [4]_ is based on
691690
two SVD processes, one approximating the data matrix, the other approximating
692691
positive sections of the resulting partial SVD factors utilizing an algebraic
693692
property of unit rank matrices. The basic NNDSVD algorithm is better fit for
@@ -696,6 +695,11 @@ the mean of all elements of the data), and NNDSVDar (in which the zeros are set
696695
to random perturbations less than the mean of the data divided by 100) are
697696
recommended in the dense case.
698697

698+
Note that the Multiplicative Update ('mu') solver cannot update zeros present in
699+
the initialization, so it leads to poorer results when used jointly with the
700+
basic NNDSVD algorithm which introduces a lot of zeros; in this case, NNDSVDa or
701+
NNDSVDar should be preferred.
702+
699703
:class:`NMF` can also be initialized with correctly scaled random non-negative
700704
matrices by setting :attr:`init="random"`. An integer seed or a
701705
``RandomState`` can also be passed to :attr:`random_state` to control
@@ -716,7 +720,7 @@ and the intensity of the regularization with the :attr:`alpha`
716720
and the regularized objective function is:
717721

718722
.. math::
719-
\frac{1}{2}||X - WH||_{Fro}^2
723+
d_{Fro}(X, WH)
720724
+ \alpha \rho ||W||_1 + \alpha \rho ||H||_1
721725
+ \frac{\alpha(1-\rho)}{2} ||W||_{Fro} ^ 2
722726
+ \frac{\alpha(1-\rho)}{2} ||H||_{Fro} ^ 2
@@ -725,35 +729,100 @@ and the regularized objective function is:
725729
:func:`non_negative_factorization` allows a finer control through the
726730
:attr:`regularization` attribute, and may regularize only W, only H, or both.
727731

732+
NMF with a beta-divergence
733+
--------------------------
734+
735+
As described previously, the most widely used distance function is the squared
736+
Frobenius norm, which is an obvious extension of the Euclidean norm to
737+
matrices:
738+
739+
.. math::
740+
d_{Fro}(X, Y) = \frac{1}{2} ||X - Y||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2
741+
742+
Other distance functions can be used in NMF as, for example, the (generalized)
743+
Kullback-Leibler (KL) divergence, also referred as I-divergence:
744+
745+
.. math::
746+
d_{KL}(X, Y) = \sum_{i,j} (X_{ij} log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})
747+
748+
Or, the Itakura-Saito (IS) divergence:
749+
750+
.. math::
751+
d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - log(\frac{X_{ij}}{Y_{ij}}) - 1)
752+
753+
These three distances are special cases of the beta-divergence family, with
754+
:math:`\beta = 2, 1, 0` respectively [6]_. The beta-divergence are
755+
defined by :
756+
757+
.. math::
758+
d_{\beta}(X, Y) = \sum_{i,j} \frac{1}{\beta(\beta - 1)}(X_{ij}^\beta + (\beta-1)Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1})
759+
760+
.. figure:: ../auto_examples/decomposition/images/sphx_glr_plot_beta_divergence_001.png
761+
:target: ../auto_examples/decomposition/plot_beta_divergence.html
762+
:align: center
763+
:scale: 75%
764+
765+
Note that this definition is not valid if :math:`\beta \in (0; 1)`, yet it can
766+
be continously extended to the definitions of :math:`d_{KL}` and :math:`d_{IS}`
767+
respectively.
768+
769+
:class:`NMF` implements two solvers, using Coordinate Descent ('cd') [5]_, and
770+
Multiplicative Update ('mu') [6]_. The 'mu' solver can optimize every
771+
beta-divergence, including of course the Frobenius norm (:math:`\beta=2`), the
772+
(generalized) Kullback-Leibler divergence (:math:`\beta=1`) and the
773+
Itakura-Saito divergence (:math:`\beta=0`). Note that for
774+
:math:`\beta \in (1; 2)`, the 'mu' solver is significantly faster than for other
775+
values of :math:`\beta`. Note also that with a negative (or 0, i.e.
776+
'itakura-saito') :math:`\beta`, the input matrix cannot contain zero values.
777+
778+
The 'cd' solver can only optimize the Frobenius norm. Due to the
779+
underlying non-convexity of NMF, the different solvers may converge to
780+
different minima, even when optimizing the same distance function.
781+
782+
NMF is best used with the ``fit_transform`` method, which returns the matrix W.
783+
The matrix H is stored into the fitted model in the ``components_`` attribute;
784+
the method ``transform`` will decompose a new matrix X_new based on these
785+
stored components::
786+
787+
>>> import numpy as np
788+
>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
789+
>>> from sklearn.decomposition import NMF
790+
>>> model = NMF(n_components=2, init='random', random_state=0)
791+
>>> W = model.fit_transform(X)
792+
>>> H = model.components_
793+
>>> X_new = np.array([[1, 0], [1, 6.1], [1, 0], [1, 4], [3.2, 1], [0, 4]])
794+
>>> W_new = model.transform(X_new)
795+
728796
.. topic:: Examples:
729797

730798
* :ref:`sphx_glr_auto_examples_decomposition_plot_faces_decomposition.py`
731799
* :ref:`sphx_glr_auto_examples_applications_topics_extraction_with_nmf_lda.py`
800+
* :ref:`sphx_glr_auto_examples_decomposition_plot_beta_divergence.py`
732801

733802
.. topic:: References:
734803

735-
* `"Learning the parts of objects by non-negative matrix factorization"
804+
.. [1] `"Learning the parts of objects by non-negative matrix factorization"
736805
<http://www.columbia.edu/~jwp2128/Teaching/W4721/papers/nmf_nature.pdf>`_
737806
D. Lee, S. Seung, 1999
738807
739-
* `"Non-negative Matrix Factorization with Sparseness Constraints"
808+
.. [2] `"Non-negative Matrix Factorization with Sparseness Constraints"
740809
<http://www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf>`_
741810
P. Hoyer, 2004
742811
743-
* `"Projected gradient methods for non-negative matrix factorization"
744-
<http://www.csie.ntu.edu.tw/~cjlin/nmf/>`_
745-
C.-J. Lin, 2007
746-
747-
* `"SVD based initialization: A head start for nonnegative
812+
.. [4] `"SVD based initialization: A head start for nonnegative
748813
matrix factorization"
749814
<http://scgroup.hpclab.ceid.upatras.gr/faculty/stratis/Papers/HPCLAB020107.pdf>`_
750815
C. Boutsidis, E. Gallopoulos, 2008
751816
752-
* `"Fast local algorithms for large scale nonnegative matrix and tensor
817+
.. [5] `"Fast local algorithms for large scale nonnegative matrix and tensor
753818
factorizations."
754819
<http://www.bsp.brain.riken.jp/publications/2009/Cichocki-Phan-IEICE_col.pdf>`_
755820
A. Cichocki, P. Anh-Huy, 2009
756821
822+
.. [6] `"Algorithms for nonnegative matrix factorization with the beta-divergence"
823+
<http://http://arxiv.org/pdf/1010.1763v3.pdf>`_
824+
C. Fevotte, J. Idier, 2011
825+
757826
758827
.. _LatentDirichletAllocation:
759828

doc/whats_new.rst

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,12 @@ New features
3535
detection based on nearest neighbors.
3636
:issue:`5279` by `Nicolas Goix`_ and `Alexandre Gramfort`_.
3737

38+
- The new solver ``mu`` implements a Multiplicate Update in
39+
:class:`decomposition.NMF`, allowing the optimization of all
40+
beta-divergences, including the Frobenius norm, the generalized
41+
Kullback-Leibler divergence and the Itakura-Saito divergence.
42+
By `Tom Dupre la Tour`_.
43+
3844
Enhancements
3945
............
4046

@@ -162,7 +168,7 @@ Bug fixes
162168
with SVD and Eigen solver are now of the same length. :issue:`7632`
163169
by :user:`JPFrancoia <JPFrancoia>`
164170

165-
- Fixes issue in :ref:`univariate_feature_selection` where score
171+
- Fixes issue in :ref:`univariate_feature_selection` where score
166172
functions were not accepting multi-label targets. :issue:`7676`
167173
by `Mohammed Affan`_
168174

@@ -392,7 +398,7 @@ Other estimators
392398

393399
- New :class:`mixture.GaussianMixture` and :class:`mixture.BayesianGaussianMixture`
394400
replace former mixture models, employing faster inference
395-
for sounder results. :issue:`7295` by :user:`Wei Xue <xuewei4d>` and
401+
for sounder results. :issue:`7295` by :user:`Wei Xue <xuewei4d>` and
396402
:user:`Thierry Guillemot <tguillemot>`.
397403

398404
- Class :class:`decomposition.RandomizedPCA` is now factored into :class:`decomposition.PCA`
@@ -515,7 +521,7 @@ Decomposition, manifold learning and clustering
515521
- :class:`cluster.KMeans` and :class:`cluster.MiniBatchKMeans` now works
516522
with ``np.float32`` and ``np.float64`` input data without converting it.
517523
This allows to reduce the memory consumption by using ``np.float32``.
518-
:issue:`6846` by :user:`Sebastian Säger <ssaeger>` and
524+
:issue:`6846` by :user:`Sebastian Säger <ssaeger>` and
519525
:user:`YenChen Lin <yenchenlin>`.
520526

521527
Preprocessing and feature selection
@@ -524,7 +530,7 @@ Preprocessing and feature selection
524530
:issue:`5929` by :user:`Konstantin Podshumok <podshumok>`.
525531

526532
- :class:`feature_extraction.FeatureHasher` now accepts string values.
527-
:issue:`6173` by :user:`Ryad Zenine <ryadzenine>` and
533+
:issue:`6173` by :user:`Ryad Zenine <ryadzenine>` and
528534
:user:`Devashish Deshpande <dsquareindia>`.
529535

530536
- Keyword arguments can now be supplied to ``func`` in
@@ -538,7 +544,7 @@ Preprocessing and feature selection
538544
Model evaluation and meta-estimators
539545

540546
- :class:`multiclass.OneVsOneClassifier` and :class:`multiclass.OneVsRestClassifier`
541-
now support ``partial_fit``. By :user:`Asish Panda <kaichogami>` and
547+
now support ``partial_fit``. By :user:`Asish Panda <kaichogami>` and
542548
:user:`Philipp Dowling <phdowling>`.
543549

544550
- Added support for substituting or disabling :class:`pipeline.Pipeline`
@@ -566,7 +572,7 @@ Metrics
566572

567573
- Added ``labels`` flag to :class:`metrics.log_loss` to to explicitly provide
568574
the labels when the number of classes in ``y_true`` and ``y_pred`` differ.
569-
:issue:`7239` by :user:`Hong Guangguo <hongguangguo>` with help from
575+
:issue:`7239` by :user:`Hong Guangguo <hongguangguo>` with help from
570576
:user:`Mads Jensen <indianajensen>` and :user:`Nelson Liu <nelson-liu>`.
571577

572578
- Support sparse contingency matrices in cluster evaluation
@@ -686,7 +692,7 @@ Decomposition, manifold learning and clustering
686692
- Fixed incorrect initialization of :func:`utils.arpack.eigsh` on all
687693
occurrences. Affects :class:`cluster.bicluster.SpectralBiclustering`,
688694
:class:`decomposition.KernelPCA`, :class:`manifold.LocallyLinearEmbedding`,
689-
and :class:`manifold.SpectralEmbedding` (:issue:`5012`). By
695+
and :class:`manifold.SpectralEmbedding` (:issue:`5012`). By
690696
:user:`Peter Fischer <yanlend>`.
691697

692698
- Attribute ``explained_variance_ratio_`` calculated with the SVD solver
@@ -969,7 +975,7 @@ New features
969975
:class:`cross_validation.LabelShuffleSplit` generate train-test folds,
970976
respectively similar to :class:`cross_validation.KFold` and
971977
:class:`cross_validation.ShuffleSplit`, except that the folds are
972-
conditioned on a label array. By `Brian McFee`_, :user:`Jean
978+
conditioned on a label array. By `Brian McFee`_, :user:`Jean
973979
Kossaifi <JeanKossaifi>` and `Gilles Louppe`_.
974980

975981
- :class:`decomposition.LatentDirichletAllocation` implements the Latent
@@ -1059,7 +1065,7 @@ Enhancements
10591065
By `Trevor Stephens`_.
10601066

10611067
- Provide an option for sparse output from
1062-
:func:`sklearn.metrics.pairwise.cosine_similarity`. By
1068+
:func:`sklearn.metrics.pairwise.cosine_similarity`. By
10631069
:user:`Jaidev Deshpande <jaidevd>`.
10641070

10651071
- Add :func:`minmax_scale` to provide a function interface for
@@ -1270,7 +1276,7 @@ Bug fixes
12701276
By `Tom Dupre la Tour`_.
12711277

12721278
- Fixed bug :issue:`5495` when
1273-
doing OVR(SVC(decision_function_shape="ovr")). Fixed by
1279+
doing OVR(SVC(decision_function_shape="ovr")). Fixed by
12741280
:user:`Elvis Dohmatob <dohmatob>`.
12751281

12761282

examples/applications/topics_extraction_with_nmf_lda.py

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@
99
The output is a list of topics, each represented as a list of terms
1010
(weights are not shown).
1111
12+
Non-negative Matrix Factorization is applied with two different objective
13+
functions: the Frobenius norm, and the generalized Kullback-Leibler divergence.
14+
The latter is equivalent to Probabilistic Latent Semantic Indexing.
15+
1216
The default parameters (n_samples / n_features / n_topics) should make
1317
the example runnable in a couple of tens of seconds. You can try to
1418
increase the dimensions of the problem, but be aware that the time
@@ -36,9 +40,10 @@
3640

3741
def print_top_words(model, feature_names, n_top_words):
3842
for topic_idx, topic in enumerate(model.components_):
39-
print("Topic #%d:" % topic_idx)
40-
print(" ".join([feature_names[i]
41-
for i in topic.argsort()[:-n_top_words - 1:-1]]))
43+
message = "Topic #%d: " % topic_idx
44+
message += " ".join([feature_names[i]
45+
for i in topic.argsort()[:-n_top_words - 1:-1]])
46+
print(message)
4247
print()
4348

4449

@@ -71,17 +76,31 @@ def print_top_words(model, feature_names, n_top_words):
7176
t0 = time()
7277
tf = tf_vectorizer.fit_transform(data_samples)
7378
print("done in %0.3fs." % (time() - t0))
79+
print()
7480

7581
# Fit the NMF model
76-
print("Fitting the NMF model with tf-idf features, "
82+
print("Fitting the NMF model (Frobenius norm) with tf-idf features, "
7783
"n_samples=%d and n_features=%d..."
7884
% (n_samples, n_features))
7985
t0 = time()
8086
nmf = NMF(n_components=n_topics, random_state=1,
8187
alpha=.1, l1_ratio=.5).fit(tfidf)
8288
print("done in %0.3fs." % (time() - t0))
8389

84-
print("\nTopics in NMF model:")
90+
print("\nTopics in NMF model (Frobenius norm):")
91+
tfidf_feature_names = tfidf_vectorizer.get_feature_names()
92+
print_top_words(nmf, tfidf_feature_names, n_top_words)
93+
94+
# Fit the NMF model
95+
print("Fitting the NMF model (generalized Kullback-Leibler divergence) with "
96+
"tf-idf features, n_samples=%d and n_features=%d..."
97+
% (n_samples, n_features))
98+
t0 = time()
99+
nmf = NMF(n_components=n_topics, random_state=1, beta_loss='kullback-leibler',
100+
solver='mu', max_iter=1000, alpha=.1, l1_ratio=.5).fit(tfidf)
101+
print("done in %0.3fs." % (time() - t0))
102+
103+
print("\nTopics in NMF model (generalized Kullback-Leibler divergence):")
85104
tfidf_feature_names = tfidf_vectorizer.get_feature_names()
86105
print_top_words(nmf, tfidf_feature_names, n_top_words)
87106

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
"""
2+
==============================
3+
Beta-divergence loss functions
4+
==============================
5+
6+
A plot that compares the various Beta-divergence loss functions supported by
7+
the Multiplicative-Update ('mu') solver in :class:`sklearn.decomposition.NMF`.
8+
"""
9+
import numpy as np
10+
import matplotlib.pyplot as plt
11+
from sklearn.decomposition.nmf import _beta_divergence
12+
13+
print(__doc__)
14+
15+
x = np.linspace(0.001, 4, 1000)
16+
y = np.zeros(x.shape)
17+
18+
colors = 'mbgyr'
19+
for j, beta in enumerate((0., 0.5, 1., 1.5, 2.)):
20+
for i, xi in enumerate(x):
21+
y[i] = _beta_divergence(1, xi, 1, beta)
22+
name = "beta = %1.1f" % beta
23+
plt.plot(x, y, label=name, color=colors[j])
24+
25+
plt.xlabel("x")
26+
plt.title("beta-divergence(1, x)")
27+
plt.legend(loc=0)
28+
plt.axis([0, 4, 0, 3])
29+
plt.show()

0 commit comments

Comments
 (0)
0