@@ -702,19 +702,20 @@ of a high-dimensional distribution by constructing a supporting hyperplane
702
702
in the feature space corresponding to the kernel, which effectively
703
703
separates the data set from the origin with maximum margin.
704
704
705
- The One-Class SVM solves the following primal problem:
705
+ For the training sample :math: `(x_i)_{i=1 }^l` with weights :math: `(w_i)_{i=1 }^l`,
706
+ :math: `\sum _{i=1 }^l w_i>0 `, the One-Class SVM solves the following primal problem:
706
707
707
708
708
709
.. math ::
709
710
710
- \min _{\rho ,\xi ,w} \frac12 w^Tw - \rho + \frac {1 }{\nu l } \sum _{i=1 }^l \xi _i\,,\\
711
+ \min _{\rho ,\xi ,w} \frac12 w^Tw - \rho + \frac {1 }{\nu W } \sum _{i=1 }^l w_i \xi _i \,, \\
711
712
712
- \textrm {subject to } & w^T\phi (x_i) \geq \rho - \xi _i\,,\\
713
- & \xi _i \geq 0 , i=1 , \ldots , l\,,
713
+ \textrm {subject to } & w^T\phi (x_i) \geq \rho - \xi _i \,, \\
714
+ & \xi _i \geq 0 \,,\, i=1 , \ldots , l \,,
714
715
715
716
716
717
where :math: `\phi (\cdot )` is the feature map associated with the
717
- kernel :math: `K(\cdot ,\cdot )`.
718
+ kernel :math: `K(\cdot ,\cdot )`, and :math: `W = \sum _{i= 1 }^l w_i` .
718
719
719
720
The dual problem is
720
721
@@ -723,25 +724,25 @@ The dual problem is
723
724
724
725
\min _\alpha \frac12 \alpha ^T Q\alpha \,\\
725
726
726
- \textrm {subject to } & 0 \leq \alpha _i \leq \frac { 1 }{ \nu l} , i=1 , \ldots , l \,,\\
727
- & e^T\alpha = 1 \,,
727
+ \textrm {subject to } & 0 \leq \alpha _i \leq w_i\,,\ , i=1 , \ldots , l \,,\\
728
+ & e^T\alpha = \nu W \,,
728
729
729
730
730
731
where :math: `e\in \mathbb {R}^{l\times 1 }` is the vector of ones and
731
732
:math: `Q_{ij} = K(x_i, x_j)` is the kernel Gram matrix.
732
733
733
- The decision function is given by:
734
+ The optimal decision function is given by:
734
735
735
- .. math :: \operatorname{sgn}(\sum_{i=1}^l \alpha_i K(x_i, x) - \rho) \,,
736
+ .. math :: x\mapsto \operatorname{sgn}(\sum_{i=1}^l \alpha_i K(x_i, x) - \rho) \,,
736
737
737
738
where :math: `+1 ` indicates and inliner and :math: `-1 ` -- an outlier.
738
739
739
740
The parameter :math: `\nu\in (0 ,1 ]` determines the fraction of outliers
740
741
in the training dataset. More technically :math: `\nu ` is:
741
- * an upper bound on the fraction training points outside
742
- the estimated region;
742
+ * an upper bound on the fraction of the training points lying outside
743
+ the estimated region;
743
744
744
- * a lower bound on the fraction of support vectors.
745
+ * a lower bound on the fraction of support vectors.
745
746
746
747
.. topic :: References:
747
748
@@ -755,52 +756,51 @@ in the training dataset. More technically :math:`\nu` is:
755
756
756
757
SVDD
757
758
----
758
- Support Vector Data Description (SVDD-L1) for Unsupervised Outlier
759
- Detection.
760
-
761
- This model, proposed by Tax and Duin (2004), aims at finding spherically
762
- shaped boundary around a data set. The original formulation suffered from
763
- non-convexity issues related to optimality of the attained solution for
764
- certain values of regularization parameter `C > 0 `. Chang, Lee, and Lin
765
- (2013) suggested a reformulation of the SVDD was proposed which had provably
766
- unique global solution for any `C `.
767
-
768
- Scikit's SVDD model is a modified version of the 2013 SVDD formulation,
769
- cf. problem (7) of Chang et al. (2013). The major change is that observations
770
- can have different penalty costs :math: `(C_i)_{i=1 }^l` instead of a single
771
- cost :math: `C`. The theorems 2-4 of Chang et al. (2013) can be extended to
772
- the case of vector of penalty costs :math: `C` with :math: `\sum _{i=1 }^l C_i > 1 `
773
- being the condition, distinguishing the case of :math: `R>0 ` (theorem 4 case 1)
774
- from :math: `R=0 ` (theorem 4 case 2).
775
-
776
- The second change is concerned with the model parametrization. In the original
777
- and 2013 formulations the parameter `C ` was difficult to interpret directly:
778
- its reciprocal determines the approximate number of support vectors of the
779
- hypersphere. To improve interpretability and provide unified interface to the
780
- One-Class SVM model, the Scikit's SVDD is parametrized with :math: `\nu\in (0 , 1 ]`,
781
- which determines the fraction of outliers in the training sample. The
782
- reparametrization in question is:
783
-
784
- .. math :: C_i = \frac{w_i}{\nu \sum_{i=1}^l w_i} \,,
785
-
786
- where :math: `(w_i)_{i=1 }^l` are non-negative sample weights. Note that in a
787
- typical run of the SVDD model the weights are set to :math: `w_i = 1 `, which
788
- is equivalent to the original 2013 SVDD problem for :math: `C = \frac {1 }{\nu l}`.
789
-
790
- In the case of a stationary kernel :math: `K(x,y)=K(x-y)` the SVDD model
791
- and the One-Class SVM models are provably equivalent
792
- (see :ref: `outlier_detection_ocsvm_vs_svdd `).
759
+
760
+ Support Vector Data Description (SVDD), proposed by Tax and Duin (2004),
761
+ aims at finding spherically shaped boundary around a data set. The original
762
+ formulation suffered from non-convexity issues related to optimality of
763
+ the attained solution for certain values of regularization parameter :math: `C`.
764
+ Chang, Lee, and Lin (2013) suggested a reformulation of the SVDD model
765
+ which had a well-defined and provably unique global solution for any :math: `C>0 `.
766
+
767
+ Scikit's SVDD model is a modified version of the 2013 SVDD formulation.
768
+ The following changes were made to problem (7) in Chang et al. (2013):
769
+
770
+ * **sample weights **: instead of a uniform penalty :math: `C>0 ` sample
771
+ observations are allowed to have different costs :math: `(C_i)_{i=1 }^l`,
772
+ :math: `\sum _{i=1 }^l C_i > 0 `;
773
+
774
+ * :math: `\nu `-**parametrization **: the penalties are determined by
775
+ :math: `C_i = \frac {w_i}{\nu \sum _{i=1 }^l w_i}`, where :math: `\nu\in (0 , 1 ]`
776
+ and :math: `(w_i)_{i=1 }^l` are non-negative sample weights.
777
+
778
+ Straightforward extension of theorems 2-4 of Chang et al. (2013) to the case
779
+ of different penalty yielded the :math: `\sum _{i=1 }^l C_i > 1 `, or equivalently
780
+ :math: `\nu < 1 `, as the condition, which distinguishes the case of :math: `R>0 `
781
+ (theorem 4 case 1) from :math: `R=0 ` (theorem 4 case 2).
782
+
783
+ The main benefit of :math: `\nu `-parametrization is clearer interpretation
784
+ and unified interface to the :ref: `svm_one_class_svm ` model. Under the
785
+ :math: `C`-parametrization the value :math: `\frac {1 }{C}` determines the
786
+ approximate average number of support vectors of the hypersphere, whereas
787
+ under :math: `\nu `-parametrization the parameter determines the fraction
788
+ of outliers in the training sample (similarly to the :ref: `svm_one_class_svm `).
789
+
790
+ Note that in a typical run of the SVDD model the weights are set to :math: `w_i = 1 `,
791
+ which is equivalent to the original 2013 SVDD formulation for :math: `C = \frac {1 }{\nu l}`.
793
792
794
793
The primal problem of Scikit's version of SVDD for the training sample
795
- :math: `(x_i)_{i=1 }^l` with weights :math: `(w_i)_{i=1 }^l` is:
794
+ :math: `(x_i)_{i=1 }^l` with weights :math: `(w_i)_{i=1 }^l`,
795
+ :math: `\sum _{i=1 }^l w_i>0 `, is:
796
796
797
797
798
798
.. math ::
799
799
800
800
\min _{R,\xi ,a} R + \frac {1 }{\nu W} \sum _{i=1 }^l w_i \xi _i\,,\\
801
801
802
802
\textrm {subject to } & \|\phi (x_i) - a\| ^2 \leq R + \xi _i\,,\\
803
- & \xi _i \geq 0 , i=1 , \ldots , l\,,\\
803
+ & \xi _i \geq 0 \,,\ , i=1 , \ldots , l\,,\\
804
804
& R \geq 0 \,,
805
805
806
806
@@ -816,29 +816,38 @@ reduces to an unconstrained convex optimization problem independent of
816
816
Note that in this case every observation is an outlier.
817
817
818
818
In the case when :math: `\nu < 1 ` the constraint :math: `R\geq 0 ` is redundant,
819
- and the dual problem has the form:
819
+ strong duality holds, and the dual problem has the form:
820
820
821
821
822
822
.. math ::
823
823
824
824
\min_\alpha \frac12 \alpha^T Q\alpha - \frac{\nu W}{2} \sum_{i=1}^l \alpha_i Q_{ii}\,,\\
825
825
826
- \textrm {subject to } & 0 \leq \alpha_i \leq w_i, \ldots, l\,,\\
826
+ \textrm {subject to } & 0 \leq \alpha_i \leq w_i\,,\, i=1 , \ldots, l\,,\\
827
827
& e^T \alpha = \nu W\,,
828
828
829
829
830
830
where :math: `e\in \mathbb {R}^{l\times 1 }` is the vector of ones and
831
831
:math: `Q_{ij} = K(x_i, x_j)` is the kernel Gram matrix.
832
832
833
- The decision function of SVDD is given by:
833
+ The decision function of the SVDD is given by:
834
834
835
- .. math :: \operatorname{sgn}(R - \|\phi(x) - a\|^2) \,,
835
+ .. math :: x\mapsto \operatorname{sgn}(R - \|\phi(x) - a\|^2) \,,
836
836
837
837
where :math: `+1 ` indicates and inliner and :math: `-1 ` -- an outlier. The
838
838
distances in the feature space and :math: `R` are computed implicitly through
839
839
the coefficients and the optimal value of the objective of the corresponding
840
840
dual problem.
841
841
842
+ It is worth noting, that in the case of a stationary kernel :math: `K(x,y)=K(x-y)`
843
+ the SVDD and One-Class SVM models are provably equivalent. Indeed, the values
844
+ :math: `Q_{ii} = K(x_i, x_i)` in the last term in the dual of the SVDD are all
845
+ equal to :math: `K(0 )`, which makes the whole term independent of :math: `\alpha `.
846
+ Therefore the objective functions of the dual problems of the One-Class SVM
847
+ and the SVDD are equivalent up to a constant. This, however, **does not imply **
848
+ that one model generalizes the other: their solutions just happen to coincide
849
+ for a particular family of kernels (see :ref: `outlier_detection_ocsvm_vs_svdd `).
850
+
842
851
.. topic :: References:
843
852
844
853
* `Support vector data description
0 commit comments