8000 [EXA] Fix links in anomaly detection example by albertcthomas · Pull Request #12665 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[EXA] Fix links in anomaly detection example #12665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 24, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 23 additions & 21 deletions examples/plot_anomaly_comparison.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,32 +14,34 @@
except for Local Outlier Factor (LOF) as it has no predict method to be applied
on new data when it is used for outlier detection.

The :class:`svm.OneClassSVM` is known to be sensitive to outliers and thus does
not perform very well for outlier detection. This estimator is best suited for
novelty detection when the training set is not contaminated by outliers.
That said, outlier detection in high-dimension, or without any assumptions on
the distribution of the inlying data is very challenging, and a One-class SVM
might give useful results in these situations depending on the value of its
hyperparameters.

:class:`covariance.EllipticEnvelope` assumes the data is Gaussian and learns
an ellipse. It thus degrades when the data is not unimodal. Notice however
that this estimator is robust to outliers.

:class:`ensemble.IsolationForest` and :class:`neighbors.LocalOutlierFactor`
seem to perform reasonably well for multi-modal data sets. The advantage of
:class:`neighbors.LocalOutlierFactor` over the other estimators is shown for
the third data set, where the two modes have different densities. This
advantage is explained by the local aspect of LOF, meaning that it only
The :class:`sklearn.svm.OneClassSVM` is known to be sensitive to outliers and
thus does not perform very well for outlier detection. This estimator is best
suited for novelty detection when the training set is not contaminated by
outliers. That said, outlier detection in high-dimension, or without any
assumptions on the distribution of the inlying data is very challenging, and a
One-class SVM might give useful results in these situations depending on the
value of its hyperparameters.

:class:`sklearn.covariance.EllipticEnvelope` assumes the data is Gaussian and
learns an ellipse. It thus degrades when the data is not unimodal. Notice
however that this estimator is robust to outliers.

:class:`sklearn.ensemble.IsolationForest` and
:class:`sklearn.neighbors.LocalOutlierFactor` seem to perform reasonably well
for multi-modal data sets. The advantage of
:class:`sklearn.neighbors.LocalOutlierFactor` over the other estimators is
shown for the third data set, where the two modes have different densities.
This advantage is explained by the local aspect of LOF, meaning that it only
compares the score of abnormality of one sample with the scores of its
neighbors.

Finally, for the last data set, it is hard to say that one sample is more
abnormal than another sample as they are uniformly distributed in a
hypercube. Except for the :class:`svm.OneClassSVM` which overfits a little, all
estimators present decent solutions for this situation. In such a case, it
would be wise to look more closely at the scores of abnormality of the samples
as a good estimator should assign similar scores to all the samples.
hypercube. Except for the :class:`sklearn.svm.OneClassSVM` which overfits a
little, all estimators present decent solutions for this situation. In such a
case, it would be wise to look more closely at the scores of abnormality of
the samples as a good estimator should assign similar scores to all the
samples.

While these examples give some intuition about the algorithms, this
intuition might not apply to very high dimensional data.
Expand Down
0