8000 DOC improved the documentation · scikit-learn/scikit-learn@85f6e8e · GitHub
[go: up one dir, main page]

Skip to content

Commit 85f6e8e

Browse files
committed
DOC improved the documentation
1 parent 2bf41cb commit 85f6e8e

File tree

2 files changed

+73
-58
lines changed

2 files changed

+73
-58
lines changed

doc/modules/svm.rst

Lines changed: 62 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -702,19 +702,20 @@ of a high-dimensional distribution by constructing a supporting hyperplane
702702
in the feature space corresponding to the kernel, which effectively
703703
separates the data set from the origin with maximum margin.
704704

705-
The One-Class SVM solves the following primal problem:
705+
For the training sample :math:`(x_i)_{i=1}^l` with weights :math:`(w_i)_{i=1}^l`,
706+
:math:`\sum_{i=1}^l w_i>0`, the One-Class SVM solves the following primal problem:
706707

707708

708709
.. math::
709710
710-
\min_{\rho,\xi,w} \frac12 w^Tw - \rho + \frac{1}{\nu l} \sum_{i=1}^l \xi_i\,,\\
711+
\min_{\rho,\xi,w} \frac12 w^Tw - \rho + \frac{1}{\nu W} \sum_{i=1}^l w_i \xi_i \,, \\
711712
712-
\textrm {subject to } & w^T\phi(x_i) \geq \rho - \xi_i\,,\\
713-
& \xi_i \geq 0, i=1, \ldots, l\,,
713+
\textrm {subject to } & w^T\phi(x_i) \geq \rho - \xi_i \,, \\
714+
& \xi_i \geq 0\,,\, i=1, \ldots, l \,,
714715
715716
716717
where :math:`\phi(\cdot)` is the feature map associated with the
717-
kernel :math:`K(\cdot,\cdot)`.
718+
kernel :math:`K(\cdot,\cdot)`, and :math:`W = \sum_{i=1}^l w_i`.
718719

719720
The dual problem is
720721

@@ -723,25 +724,25 @@ The dual problem is
723724
724725
\min_\alpha \frac12 \alpha^T Q\alpha\,\\
725726
726-
\textrm {subject to } & 0\leq \alpha_i \leq \frac{1}{\nu l}, i=1, \ldots, l \,,\\
727-
& e^T\alpha = 1 \,,
727+
\textrm {subject to } & 0\leq \alpha_i \leq w_i\,,\, i=1, \ldots, l \,,\\
728+
& e^T\alpha = \nu W \,,
728729
729730
730731
where :math:`e\in \mathbb{R}^{l\times 1}` is the vector of ones and
731732
:math:`Q_{ij} = K(x_i, x_j)` is the kernel Gram matrix.
732733

733-
The decision function is given by:
734+
The optimal decision function is given by:
734735

735-
.. math:: \operatorname{sgn}(\sum_{i=1}^l \alpha_i K(x_i, x) - \rho) \,,
736+
.. math:: x\mapsto \operatorname{sgn}(\sum_{i=1}^l \alpha_i K(x_i, x) - \rho) \,,
736737

737738
where :math:`+1` indicates and inliner and :math:`-1` -- an outlier.
738739

739740
The parameter :math:`\nu\in(0,1]` determines the fraction of outliers
740741
in the training dataset. More technically :math:`\nu` is:
741-
* an upper bound on the fraction training points outside
742-
the estimated region;
742+
* an upper bound on the fraction of the training points lying outside
743+
the estimated region;
743744

744-
* a lower bound on the fraction of support vectors.
745+
* a lower bound on the fraction of support vectors.
745746

746747
.. topic:: References:
747748

@@ -755,52 +756,51 @@ in the training dataset. More technically :math:`\nu` is:
755756

756757
SVDD
757758
----
758-
Support Vector Data Description (SVDD-L1) for Unsupervised Outlier
759-
Detection.
760-
761-
This model, proposed by Tax and Duin (2004), aims at finding spherically
762-
shaped boundary around a data set. The original formulation suffered from
763-
non-convexity issues related to optimality of the attained solution for
764-
certain values of regularization parameter `C > 0`. Chang, Lee, and Lin
765-
(2013) suggested a reformulation of the SVDD was proposed which had provably
766-
unique global solution for any `C`.
767-
768-
Scikit's SVDD model is a modified version of the 2013 SVDD formulation,
769-
cf. problem (7) of Chang et al. (2013). The major change is that observations
770-
can have different penalty costs :math:`(C_i)_{i=1}^l` instead of a single
771-
cost :math:`C`. The theorems 2-4 of Chang et al. (2013) can be extended to
772-
the case of vector of penalty costs :math:`C` with :math:`\sum_{i=1}^l C_i > 1`
773-
being the condition, distinguishing the case of :math:`R>0` (theorem 4 case 1)
774-
from :math:`R=0` (theorem 4 case 2).
775-
776-
The second change is concerned with the model parametrization. In the original
777-
and 2013 formulations the parameter `C` was difficult to interpret directly:
778-
its reciprocal determines the approximate number of support vectors of the
779-
hypersphere. To improve interpretability and provide unified interface to the
780-
One-Class SVM model, the Scikit's SVDD is parametrized with :math:`\nu\in(0, 1]`,
781-
which determines the fraction of outliers in the training sample. The
782-
reparametrization in question is:
783-
784-
.. math:: C_i = \frac{w_i}{\nu \sum_{i=1}^l w_i} \,,
785-
786-
where :math:`(w_i)_{i=1}^l` are non-negative sample weights. Note that in a
787-
typical run of the SVDD model the weights are set to :math:`w_i = 1`, which
788-
is equivalent to the original 2013 SVDD problem for :math:`C = \frac{1}{\nu l}`.
789-
790-
In the case of a stationary kernel :math:`K(x,y)=K(x-y)` the SVDD model
791-
and the One-Class SVM models are provably equivalent
792-
(see :ref:`outlier_detection_ocsvm_vs_svdd`).
759+
760+
Support Vector Data Description (SVDD), proposed by Tax and Duin (2004),
761+
aims at finding spherically shaped boundary around a data set. The original
762+
formulation suffered from non-convexity issues related to optimality of
763+
the attained solution for certain values of regularization parameter :math:`C`.
764+
Chang, Lee, and Lin (2013) suggested a reformulation of the SVDD model
765+
which had a well-defined and provably unique global solution for any :math:`C>0`.
766+
767+
Scikit's SVDD model is a modified version of the 2013 SVDD formulation.
768+
The following changes were made to problem (7) in Chang et al. (2013):
769+
770+
* **sample weights**: instead of a uniform penalty :math:`C>0` sample
771+
observations are allowed to have different costs :math:`(C_i)_{i=1}^l`,
772+
:math:`\sum_{i=1}^l C_i > 0`;
773+
774+
* :math:`\nu`-**parametrization**: the penalties are determined by
775+
:math:`C_i = \frac{w_i}{\nu \sum_{i=1}^l w_i}`, where :math:`\nu\in(0, 1]`
776+
and :math:`(w_i)_{i=1}^l` are non-negative sample weights.
777+
778+
Straightforward extension of theorems 2-4 of Chang et al. (2013) to the case
779+
of different penalty yielded the :math:`\sum_{i=1}^l C_i > 1`, or equivalently
780+
:math:`\nu < 1`, as the condition, which distinguishes the case of :math:`R>0`
781+
(theorem 4 case 1) from :math:`R=0` (theorem 4 case 2).
782+
783+
The main benefit of :math:`\nu`-parametrization is clearer interpretation
784+
and unified interface to the :ref:`svm_one_class_svm` model. Under the
785+
:math:`C`-parametrization the value :math:`\frac{1}{C}` determines the
786+
approximate average number of support vectors of the hypersphere, whereas
787+
under :math:`\nu`-parametrization the parameter determines the fraction
788+
of outliers in the training sample (similarly to the :ref:`svm_one_class_svm`).
789+
790+
Note that in a typical run of the SVDD model the weights are set to :math:`w_i = 1`,
791+
which is equivalent to the original 2013 SVDD formulation for :math:`C = \frac{1}{\nu l}`.
793792

794793
The primal problem of Scikit's version of SVDD for the training sample
795-
:math:`(x_i)_{i=1}^l` with weights :math:`(w_i)_{i=1}^l` is:
794+
:math:`(x_i)_{i=1}^l` with weights :math:`(w_i)_{i=1}^l`,
795+
:math:`\sum_{i=1}^l w_i>0`, is:
796796

797797

798798
.. math::
799799
800800
\min_{R,\xi,a} R + \frac{1}{\nu W} \sum_{i=1}^l w_i \xi_i\,,\\
801801
802802
\textrm {subject to } & \|\phi(x_i) - a\|^2 \leq R + \xi_i\,,\\
803-
& \xi_i \geq 0, i=1, \ldots, l\,,\\
803+
& \xi_i \geq 0\,,\, i=1, \ldots, l\,,\\
804804
& R \geq 0\,,
805805
806806
@@ -816,29 +816,38 @@ reduces to an unconstrained convex optimization problem independent of
816816
Note that in this case every observation is an outlier.
817817

818818
In the case when :math:`\nu < 1` the constraint :math:`R\geq 0` is redundant,
819-
and the dual problem has the form:
819+
strong duality holds, and the dual problem has the form:
820820

821821

822822
.. math ::
823823
824824
\min_\alpha \frac12 \alpha^T Q\alpha - \frac{\nu W}{2} \sum_{i=1}^l \alpha_i Q_{ii}\,,\\
825825
826-
\textrm {subject to } & 0 \leq \alpha_i \leq w_i, \ldots, l\,,\\
826+
\textrm {subject to } & 0 \leq \alpha_i \leq w_i\,,\, i=1, \ldots, l\,,\\
827827
& e^T \alpha = \nu W\,,
828828
829829
830830
where :math:`e\in \mathbb{R}^{l\times 1}` is the vector of ones and
831831
:math:`Q_{ij} = K(x_i, x_j)` is the kernel Gram matrix.
832832

833-
The decision function of SVDD is given by:
833+
The decision function of the SVDD is given by:
834834

835-
.. math:: \operatorname{sgn}(R - \|\phi(x) - a\|^2) \,,
835+
.. math:: x\mapsto \operatorname{sgn}(R - \|\phi(x) - a\|^2) \,,
836836

837837
where :math:`+1` indicates and inliner and :math:`-1` -- an outlier. The
838838
distances in the feature space and :math:`R` are computed implicitly through
839839
the coefficients and the optimal value of the objective of the corresponding
840840
dual problem.
841841

842+
It is worth noting, that in the case of a stationary kernel :math:`K(x,y)=K(x-y)`
843+
the SVDD and One-Class SVM models are provably equivalent. Indeed, the values
844+
:math:`Q_{ii} = K(x_i, x_i)` in the last term in the dual of the SVDD are all
845+
equal to :math:`K(0)`, which makes the whole term independent of :math:`\alpha`.
846+
Therefore the objective functions of the dual problems of the One-Class SVM
847+
and the SVDD are equivalent up to a constant. This, however, **does not imply**
848+
that one model generalizes the other: their solutions just happen to coincide
849+
for a particular family of kernels (see :ref:`outlier_detection_ocsvm_vs_svdd`).
850+
842851
.. topic:: References:
843852

844853
* `Support vector data description

examples/svm/plot_oneclass_vs_svdd.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,23 @@
33
One-Class SVM versus SVDD
44
=========================
55
6-
An example using a comparing One-Class SVM against SVDD for novelty
6+
An example comparing the One-Class SVM and SVDD models for novelty
77
detection.
88
99
:ref:`Support Vector Data Description (SVDD) <svm_outlier_detection>`
1010
and :ref:`One-Class SVM <svm_outlier_detection>` are unsupervised
11-
algorithms that learn a decision function for novelty detection:
12-
classifying new data as similar or different to the training set.
11+
algorithms that learn a decision function for novelty detection, i.e
12+
the problem of classifying new data as similar or different to the
13+
training set.
1314
1415
It can be shown that the One-Class SVM and SVDD models yield identical
15-
results in the case of a stationary kernel, like RBF. This example
16-
demonstrates this.
16+
results in the case of a stationary kernel, like RBF, but produce different
17+
decision functions for non-stationary kernels, e.g. polynomial. This
18+
example demonstrates this.
19+
20+
Note, that it is incorrect to say that the SVDD generalizes the One-Class
21+
SVM: these are different models, which just happen to coincide for a
22+
particular family of kernels.
1723
"""
1824
print(__doc__)
1925

0 commit comments

Comments
 (0)
0