|
14 | 14 | except for Local Outlier Factor (LOF) as it has no predict method to be applied
|
15 | 15 | on new data when it is used for outlier detection.
|
16 | 16 |
|
17 |
| -The :class:`sklearn.svm.OneClassSVM` is known to be sensitive to outliers and |
18 |
| -thus does not perform very well for outlier detection. This estimator is best |
19 |
| -suited for novelty detection when the training set is not contaminated by |
20 |
| -outliers. That said, outlier detection in high-dimension, or without any |
21 |
| -assumptions on the distribution of the inlying data is very challenging, and a |
22 |
| -One-class SVM might give useful results in these situations depending on the |
23 |
| -value of its hyperparameters. |
24 |
| -
|
25 |
| -:class:`sklearn.covariance.EllipticEnvelope` assumes the data is Gaussian and |
26 |
| -learns an ellipse. It thus degrades when the data is not unimodal. Notice |
27 |
| -however that this estimator is robust to outliers. |
28 |
| -
|
29 |
| -:class:`sklearn.ensemble.IsolationForest` and |
30 |
| -:class:`sklearn.neighbors.LocalOutlierFactor` seem to perform reasonably well |
31 |
| -for multi-modal data sets. The advantage of |
32 |
| -:class:`sklearn.neighbors.LocalOutlierFactor` over the other estimators is |
33 |
| -shown for the third data set, where the two modes have different densities. |
34 |
| -This advantage is explained by the local aspect of LOF, meaning that it only |
| 17 | +The :class:`svm.OneClassSVM` is known to be sensitive to outliers and thus does |
| 18 | +not perform very well for outlier detection. This estimator is best suited for |
| 19 | +novelty detection when the training set is not contaminated by outliers. |
| 20 | +That said, outlier detection in high-dimension, or without any assumptions on |
| 21 | +the distribution of the inlying data is very challenging, and a One-class SVM |
| 22 | +might give useful results in these situations depending on the value of its |
| 23 | +hyperparameters. |
| 24 | +
|
| 25 | +:class:`covariance.EllipticEnvelope` assumes the data is Gaussian and learns |
| 26 | +an ellipse. It thus degrades when the data is not unimodal. Notice however |
| 27 | +that this estimator is robust to outliers. |
| 28 | +
|
| 29 | +:class:`ensemble.IsolationForest` and :class:`neighbors.LocalOutlierFactor` |
| 30 | +seem to perform reasonably well for multi-modal data sets. The advantage of |
| 31 | +:class:`neighbors.LocalOutlierFactor` over the other estimators is shown for |
| 32 | +the third data set, where the two modes have different densities. This |
| 33 | +advantage is explained by the local aspect of LOF, meaning that it only |
35 | 34 | compares the score of abnormality of one sample with the scores of its
|
36 | 35 | neighbors.
|
37 | 36 |
|
38 | 37 | Finally, for the last data set, it is hard to say that one sample is more
|
39 | 38 | abnormal than another sample as they are uniformly distributed in a
|
40 |
| -hypercube. Except for the :class:`sklearn.svm.OneClassSVM` which overfits a |
41 |
| -little, all estimators present decent solutions for this situation. In such a |
42 |
| -case, it would be wise to look more closely at the scores of abnormality of |
43 |
| -the samples as a good estimator should assign similar scores to all the |
44 |
| -samples. |
| 39 | +hypercube. Except for the :class:`svm.OneClassSVM` which overfits a little, all |
| 40 | +estimators present decent solutions for this situation. In such a case, it |
| 41 | +would be wise to look more closely at the scores of abnormality of the samples |
| 42 | +as a good estimator should assign similar scores to all the samples. |
45 | 43 |
|
46 | 44 | While these examples give some intuition about the algorithms, this
|
47 | 45 | intuition might not apply to very high dimensional data.
|
|
0 commit comments