|
14 | 14 | except for Local Outlier Factor (LOF) as it has no predict method to be applied
|
15 | 15 | on new data when it is used for outlier detection.
|
16 | 16 |
|
17 |
| -The :class:`svm.OneClassSVM` is known to be sensitive to outliers and thus does |
18 |
| -not perform very well for outlier detection. This estimator is best suited for |
19 |
| -novelty detection when the training set is not contaminated by outliers. |
20 |
| -That said, outlier detection in high-dimension, or without any assumptions on |
21 |
| -the distribution of the inlying data is very challenging, and a One-class SVM |
22 |
| -might give useful results in these situations depending on the value of its |
23 |
| -hyperparameters. |
24 |
| -
|
25 |
| -:class:`covariance.EllipticEnvelope` assumes the data is Gaussian and learns |
26 |
| -an ellipse. It thus degrades when the data is not unimodal. Notice however |
27 |
| -that this estimator is robust to outliers. |
28 |
| -
|
29 |
| -:class:`ensemble.IsolationForest` and :class:`neighbors.LocalOutlierFactor` |
30 |
| -seem to perform reasonably well for multi-modal data sets. The advantage of |
31 |
| -:class:`neighbors.LocalOutlierFactor` over the other estimators is shown for |
32 |
| -the third data set, where the two modes have different densities. This |
33 |
| -advantage is explained by the local aspect of LOF, meaning that it only |
| 17 | +The :class:`sklearn.svm.OneClassSVM` is known to be sensitive to outliers and |
| 18 | +thus does not perform very well for outlier detection. This estimator is best |
| 19 | +suited for novelty detection when the training set is not contaminated by |
| 20 | +outliers. That said, outlier detection in high-dimension, or without any |
| 21 | +assumptions on the distribution of the inlying data is very challenging, and a |
| 22 | +One-class SVM might give useful results in these situations depending on the |
| 23 | +value of its hyperparameters. |
| 24 | +
|
| 25 | +:class:`sklearn.covariance.EllipticEnvelope` assumes the data is Gaussian and |
| 26 | +learns an ellipse. It thus degrades when the data is not unimodal. Notice |
| 27 | +however that this estimator is robust to outliers. |
| 28 | +
|
| 29 | +:class:`sklearn.ensemble.IsolationForest` and |
| 30 | +:class:`sklearn.neighbors.LocalOutlierFactor` seem to perform reasonably well |
| 31 | +for multi-modal data sets. The advantage of |
| 32 | +:class:`sklearn.neighbors.LocalOutlierFactor` over the other estimators is |
| 33 | +shown for the third data set, where the two modes have different densities. |
| 34 | +This advantage is explained by the local aspect of LOF, meaning that it only | 34 | 35 | compares the score of abnormality of one sample with the scores of its
|
35 | 36 | neighbors.
|
36 | 37 |
|
37 | 38 | Finally, for the last data set, it is hard to say that one sample is more
|
38 | 39 | abnormal than another sample as they are uniformly distributed in a
|
39 |
| -hypercube. Except for the :class:`svm.OneClassSVM` which overfits a little, all |
40 |
| -estimators present decent solutions for this situation. In such a case, it |
41 |
| -would be wise to look more closely at the scores of abnormality of the samples |
42 |
| -as a good estimator should assign similar scores to all the samples. |
| 40 | +hypercube. Except for the :class:`sklearn.svm.OneClassSVM` which overfits a |
| 41 | +little, all estimators present decent solutions for this situation. In such a |
| 42 | +case, it would be wise to look more closely at the scores of abnormality of |
| 43 | +the samples as a good estimator should assign similar scores to all the |
| 44 | +samples. |
43 | 45 |
|
44 | 46 | While these examples give some intuition about the algorithms, this
|
45 | 47 | intuition might not apply to very high dimensional data.
|
|
0 commit comments