8000 ENH add multiplicative-update solver in NMF, with all beta-divergence · scikit-learn/scikit-learn@d9d7d44 · GitHub
[go: up one dir, main page]

Skip to content
8000

Commit d9d7d44

Browse files
committed
ENH add multiplicative-update solver in NMF, with all beta-divergence
1 parent 2a5652a commit d9d7d44

File tree

4 files changed

+897
-125
lines changed

4 files changed

+897
-125
lines changed

doc/modules/decomposition.rst

Lines changed: 61 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -647,21 +647,20 @@ components with some sparsity:
647647
Non-negative matrix factorization (NMF or NNMF)
648648
===============================================
649649

650+
NMF with the Frobenius norm
651+
---------------------------
652+
650653
:class:`NMF` is an alternative approach to decomposition that assumes that the
651654
data and the components are non-negative. :class:`NMF` can be plugged in
652655
instead of :class:`PCA` or its variants, in the cases where the data matrix
653-
does not contain negative values.
654-
It finds a decomposition of samples :math:`X`
655-
into two matrices :math:`W` and :math:`H` of non-negative elements,
656-
by optimizing for the squared Frobenius norm:
656+
does not contain negative values. It finds a decomposition of samples
657+
:math:`X` into two matrices :math:`W` and :math:`H` of non-negative elements,
658+
by optimizing the distance :math:`d` between :math:`X` and the matrix product
659+
:math:`WH`. The most widely used distance function is the squared Frobenius
660+
norm, which is an obvious extension of the Euclidean norm to matrices:
657661

658662
.. math::
659-
\arg\min_{W,H} \frac{1}{2} ||X - WH||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {WH}_{ij})^2
660-
661-
This norm is an obvious extension of the Euclidean norm to matrices. (Other
662-
optimization objectives have been suggested in the NMF literature, in
663-
particular Kullback-Leibler divergence, but these are not currently
664-
implemented.)
663+
d_{Fro}(X, Y) = \frac{1}{2} ||X - Y||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2
665664
666665
Unlike :class:`PCA`, the representation of a vector is obtained in an additive
667666
fashion, by superimposing the components, without subtracting. Such additive
@@ -715,7 +714,7 @@ and the intensity of the regularization with the :attr:`alpha`
715714
and the regularized objective function is:
716715

717716
.. math::
718-
\frac{1}{2}||X - WH||_{Fro}^2
717+
d_{Fro}(X, WH)
719718
+ \alpha \rho ||W||_1 + \alpha \rho ||H||_1
720719
+ \frac{\alpha(1-\rho)}{2} ||W||_{Fro} ^ 2
721720
+ \frac{\alpha(1-\rho)}{2} ||H||_{Fro} ^ 2
@@ -724,10 +723,57 @@ and the regularized objective function is:
724723
:func:`non_negative_factorization` allows a finer control through the
725724
:attr:`regularization` attribute, and may regularize only W, only H, or both.
726725

726+
NMF with a beta-divergence
727+
--------------------------
728+
729+
As described previously, the most widely used distance function is the squared
730+
Frobenius norm, which is an obvious extension of the Euclidean norm to
731+
matrices:
732+
733+
.. math::
734+
d_{Fro}(X, Y) = \frac{1}{2} ||X - Y||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2
735+
736+
However, NMF can also be used with a different function to measure the
737+
distance between X and the matrix product WH. Another typical distance
738+
function used in NMF is the (generalized) Kullback-Leibler (KL) divergence,
739+
also referred as I-divergence:
740+
741+
.. math::
742+
d_{KL}(X, Y) = \sum_{i,j} (X_{ij} * log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})
743+
744+
Another (less) commonly used distance is the Itakura-Saito (IS) divergence:
745+
746+
.. math::
747+
d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - log(\frac{X_{ij}}{Y_{ij}}) - 1)
748+
749+
These three distances are special cases of the beta-divergence family, with
750+
:math:`\beta = 2, 1, 0` respectively [Fevotte, 2011]. The beta-divergence are
751+
defined by :
752+
753+
.. math::
754+
d_{\beta}(X, Y) = \sum_{i,j} \frac{1}{\beta(\beta - 1)}(X_{ij}^\beta + (\beta-1)Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1})
755+
756+
Note that this definition is not valid if :math:`\beta \in (0; 1)`, yet it can be
757+
continously extended to the definitions of :math:`d_{KL}` and :math:`d_{IS}`
758+
respectively.
759+
760+
:class:`NMF` implements three solvers, using Projected Gradient ('pg')
761+
(deprecated, it will be removed in 0.19), Coordinate Descent ('cd'), and
762+
Multiplicative Update ('mu'). The 'mu' solver can optimize every
763+
beta-divergence, including of course the Frobenius norm (:math:`\beta=2`), the
764+
(generalized) Kullback-Leibler divergence (:math:`\beta=1`) and the
765+
Itakura-Saito divergence (:math:`\beta=0`). Note that for
766+
:math:`\beta \in (1; 2)`, the 'mu' solver is significantly faster.
767+
768+
The 'cd' and 'pg' solvers can only optimize the Frobenius norm. Due to the
769+
underlying non-convexity of NMF, the different solvers may converge to
770+
different minima, even when optimizing the same distance function.
771+
727772
.. topic:: Examples:
728773

729774
* :ref:`example_decomposition_plot_faces_decomposition.py`
730775
* :ref:`example_applications_topics_extraction_with_nmf_lda.py`
776+
* :ref:`example_decomposition_plot_beta_divergence.py`
731777

732778
.. topic:: References:
733779

@@ -753,6 +799,10 @@ and the regularized objective function is:
753799
<http://www.bsp.brain.riken.jp/publications/2009/Cichocki-Phan-IEICE_col.pdf>`_
754800
A. Cichocki, P. Anh-Huy, 2009
755801

802+
* `"Algorithms for nonnegative matrix factorization with the beta-divergence"
803+
<http://http://arxiv.org/pdf/1010.1763v3.pdf>`_
804+
C. Fevotte, J. Idier, 2011
805+
756806

757807
.. _LatentDirichletAllocation:
758808

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""
2+
==============================
3+
Beta-divergence loss functions
4+
==============================
5+
6+
A plot that compares the various Beta-divergence loss functions supported by
7+
the Multiplicative-Update ('mu') solver in :class:`sklearn.decomposition.NMF`.
8+
"""
9+
print(__doc__)
10+
11+
import numpy as np
12+
import matplotlib.pyplot as plt
13+
from sklearn.decomposition.nmf import beta_divergence
14+
15+
x = np.linspace(0.001, 4, 1000)
16+
y = np.zeros(x.shape)
17+
18+
colors = 'mbgyr'
19+
for j, beta in enumerate((0., 0.5, 1., 1.5, 2.)):
20+
for i, xi in enumerate(x):
21+
y[i] = beta_divergence(1, xi, 1, beta)
22+
name = "beta = %1.1f" % beta
23+
plt.plot(x, y, label=name, color=colors[j])
24+
25+
plt.xlabel("x")
26+
plt.title("beta-divergence(1, x)")
27+
plt.legend(loc=0)
28+
plt.axis([0, 4, 0, 3])
29+
plt.show()
30+
plt.close('all')

0 commit comments

Comments
 (0)
0