@@ -60,31 +60,18 @@ There are two SVM-based approaches for that purpose:
60
60
2. :class: `svm.SVDD ` finds a sphere with a minimum radius which encloses
61
61
the data.
62
62
63
- Both methods can implicitly work in transformed high-dimensional space using
64
- the kernel trick, the RBF kernel is used by default. :class: `svm.OneClassSVM `
65
- provides :math: `\nu ` parameter for controlling the trade off between the
66
- margin and the number of outliers during training, namely it is an upper bound
67
- on the fraction of outliers in a training set or probability of finding a
68
- new, but regular, observation outside the frontier. :clss: `svm.SVDD ` provides a
69
- similar parameter :math: `C = 1 / (\nu l)`, where :math: `l` is the number of
70
- samples, such that :math: `1 /C` approximately equals the number of outliers in
71
- a training set.
72
-
73
- .. topic :: References:
74
-
75
- * Bernhard Schölkopf et al, `Estimating the support of a high-dimensional
76
- distribution <http://dl.acm.org/citation.cfm?id=1119749> `_, Neural
77
- computation 13.7 (2001): 1443-1471.
78
- * David M. J. Tax and Robert P. W. Duin, `Support vector data description
79
- <http://dl.acm.org/citation.cfm?id=960109> `_, Machine Learning,
80
- 54(1):45-66, 2004.
81
-
82
- .. topic :: Examples:
83
-
84
- * See :ref: `example_svm_plot_oneclass.py ` for visualizing the
85
- frontier learned around some data by :class: `svm.OneClassSVM `.
86
- * See :ref: `example_svm_plot_oneclass_vs_svdd.py ` to get the idea about
87
- the difference between the two approaches.
63
+ Both methods can implicitly work in a transformed high-dimensional space using
64
+ the kernel trick. :class: `svm.OneClassSVM ` provides :math: `\nu ` parameter for
65
+ controlling the trade off between the margin and the number of outliers during
66
+ training, namely it is an upper bound on the fraction of outliers in a training
67
+ set or probability of finding a new, but regular, observation outside the
68
+ frontier. :clss: `svm.SVDD ` provides a similar parameter
69
+ :math: `C = 1 / (\nu l)`, where :math: `l` is the number of samples, such that
70
+ :math: `1 /C` approximately equals the number of outliers in a training set.
71
+
72
+ Both methods are equivalent if a) the kernel used depends only on the
73
+ difference between two vectors, one example is RBF kernel, and
74
+ b) :math: `C = 1 / (\nu l)`.
88
75
89
76
.. figure :: ../auto_examples/svm/images/plot_oneclass_001.png
90
77
:target: ../auto_examples/svm/plot_oneclasse.html
@@ -96,6 +83,22 @@ a training set.
96
83
:align: center
97
84
:scale: 75
98
85
86
+ .. topic :: Examples:
87
+
88
+ * See :ref: `example_svm_plot_oneclass.py ` for visualizing the
89
+ frontier learned around some data by :class: `svm.OneClassSVM `.
90
+ * See :ref: `example_svm_plot_oneclass_vs_svdd.py ` to get the idea about
91
+ the difference between the two approaches.
92
+
93
+ .. topic :: References:
94
+
95
+ * Bernhard Schölkopf et al, `Estimating the Support of a High-Dimensional
96
+ Distribution <http://dl.acm.org/citation.cfm?id=1119749> `_, Neural
97
+ computation 13.7 (2001): 1443-1471.
98
+ * David M. J. Tax and Robert P. W. Duin, `Support Vector Data Description
99
+ <http://dl.acm.org/citation.cfm?id=960109> `_, Machine Learning,
100
+ 54(1):45-66, 2004.
101
+
99
102
100
103
Outlier Detection
101
104
=================
@@ -190,48 +193,73 @@ This strategy is illustrated below.
190
193
Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on.
191
194
192
195
193
- Comparison of different approaches
194
- ----------------------------------
196
+ One-class SVM versus Elliptic Envelope versus Isolation Forest
197
+ --------------------------------------------------------------
195
198
196
- Strictly-speaking, the SVM-based methods are not designed for outlier
197
- detection, but rather for novelty detection: its training set should not be
198
- contaminated by outliers as it may fit them. That said, outlier detection in
199
- high-dimension, or without any assumptions on the distribution of the inlying
200
- data is very challenging, and a SVM-based methods give useful results in these
201
- situations.
199
+ Strictly-speaking, the One-class SVM is not an outlier-detection method,
200
+ but a novelty- detection method : its training set should not be
201
+ contaminated by outliers as it may fit them. That said, outlier detection
202
+ in high-dimension, or without any assumptions on the distribution of the
203
+ inlying data is very challenging, and a One-class SVM gives useful
204
+ results in these situations.
202
205
203
206
The examples below illustrate how the performance of the
204
- :class: `covariance.EllipticEnvelope ` degrades as the data is less and less
205
- unimodal, and other methods become more beneficial. Note, that the parameters
206
- of :class: ` svm.OneClassSVM ` and :class: `svm.SVDD ` are set to achieve their
207
- equivalence, i. e. :math: `C = 1 / ( \nu l)` .
207
+ :class: `covariance.EllipticEnvelope ` degrades as the data is less and
208
+ less unimodal. The :class: ` svm.OneClassSVM ` works better on data with
209
+ multiple modes and :class: `ensemble.IsolationForest ` performs well in all
210
+ cases .
208
211
209
- |
212
+ :class: `svm.SVDD ` is not presented in comparison as it works the same as
213
+ :class: `svm.OneClassSVM ` when using RBF kernel.
210
214
211
- - For a inlier mode well-centered and elliptic all methods give approximately
212
- equally good results.
213
-
214
- .. figure :: ../auto_examples/covariance/images/plot_outlier_detection_001.png
215
+ .. |outlier1 | image :: ../auto_examples/covariance/images/plot_outlier_detection_001.png
215
216
:target: ../auto_examples/covariance/plot_outlier_detection.html
216
- :align: center
217
- :scale: 75%
217
+ :scale: 50%
218
218
219
- - As the inlier distribution becomes bimodal,
220
- :class: `covariance.EllipticEnvelope ` does not fit well the inliers. However,
221
- we can see that other methods also have difficulties to detect the two modes,
222
- but generally perform equally well.
219
+ .. |outlier2 | image :: ../auto_examples/covariance/images/plot_outlier_detection_002.png
220
+ :target: ../auto_examples/covariance/plot_outlier_detection.html
221
+ :scale: 50%
223
222
224
- .. figure :: ../auto_examples/covariance/images/plot_outlier_detection_002 .png
223
+ .. | outlier3 | image :: ../auto_examples/covariance/images/plot_outlier_detection_003 .png
225
224
:target: ../auto_examples/covariance/plot_outlier_detection.html
226
- :align: center
227
- :scale: 75%
225
+ :scale: 50%
226
+
227
+ .. list-table :: **Comparing One-class SVM approach, and elliptic envelope**
228
+ :widths: 40 60
229
+
230
+ *
231
+ - For a inlier mode well-centered and elliptic, the
232
+ :class: `svm.OneClassSVM ` is not able to benefit from the
233
+ rotational symmetry of the inlier population. In addition, it
234
+ fits a bit the outliers present in the training set. On the
235
+ opposite, the decision rule based on fitting an
236
+ :class: `covariance.EllipticEnvelope ` learns an ellipse, which
237
+ fits well the inlier distribution. The :class: `ensemble.IsolationForest `
238
+ performs as well.
239
+ - |outlier1 |
240
+
241
+ *
242
+ - As the inlier distribution becomes bimodal, the
243
+ :class: `covariance.EllipticEnvelope ` does not fit well the
244
+ inliers. However, we can see that both :class: `ensemble.IsolationForest `
245
+ and :class: `svm.OneClassSVM ` have difficulties to detect the two modes,
246
+ and that the :class: `svm.OneClassSVM `
247
+ tends to overfit: because it has not model of inliers, it
248
+ interprets a region where, by chance some outliers are
249
+ clustered, as inliers.
250
+ - |outlier2 |
251
+
252
+ *
253
+ - If the inlier distribution is strongly non Gaussian, the
254
+ :class: `svm.OneClassSVM ` is able to recover a reasonable
255
+ approximation as well as :class: `ensemble.IsolationForest `,
256
+ whereas the :class: `covariance.EllipticEnvelope ` completely fails.
257
+ - |outlier3 |
228
258
229
- - As the inlier distribution gets strongly non-Gaussian,
230
- :class: `covariance.EllipticEnvelope ` starts to perform inadequate. Other
231
- methods give a reasonable representation, with
232
- :class: `ensemble.IsolationForest ` having the least amount of errors.
259
+ .. topic :: Examples:
233
260
234
- .. figure :: ../auto_examples/covariance/images/plot_outlier_detection_003.png
235
- :target: ../auto_examples/covariance/plot_outlier_detection.html
236
- :align: center
237
- :scale: 75%
261
+ * See :ref: `example_covariance_plot_outlier_detection.py ` for a
262
+ comparison of the :class: `svm.OneClassSVM ` (tuned to perform like
263
+ an outlier detection method), the :class: `ensemble.IsolationForest `
264
+ and a covariance-based outlier
265
+ detection with :class: `covariance.MinCovDet `.
0 commit comments