scikit-learn
diff --git a/‎doc/modules/outlier_detection.rst
Lines changed: 87 additions & 59 deletions b/‎doc/modules/outlier_detection.rst
Lines changed: 87 additions & 59 deletions
diff --git a/‎doc/modules/svm.rst
Lines changed: 7 additions & 6 deletions b/‎doc/modules/svm.rst
Lines changed: 7 additions & 6 deletions
diff --git a/‎examples/covariance/plot_outlier_detection.py
Lines changed: 12 additions & 14 deletions b/‎examples/covariance/plot_outlier_detection.py
Lines changed: 12 additions & 14 deletions
@@ -60,31 +60,18 @@ There are two SVM-based approaches for that purpose:
 2. :class:`svm.SVDD` finds a sphere with a minimum radius which encloses
    the data.
 
-Both methods can implicitly work in transformed high-dimensional space using
-the kernel trick, the RBF kernel is used by default. :class:`svm.OneClassSVM`
-provides :math:`\nu` parameter for controlling the trade off between the
-margin and the number of outliers during training, namely it is an upper bound
-on the fraction of outliers in a training set or probability of finding a
-new, but regular, observation outside the frontier. :clss:`svm.SVDD` provides a
-similar parameter :math:`C = 1 / (\nu l)`, where :math:`l` is the number of
-samples, such that :math:`1/C` approximately equals the number of outliers in
-a training set.
-
-.. topic:: References:
-
-    * Bernhard Schölkopf et al, `Estimating the support of a high-dimensional
-      distribution <http://dl.acm.org/citation.cfm?id=1119749>`_, Neural
-      computation 13.7 (2001): 1443-1471.
-    * David M. J. Tax and Robert P. W. Duin, `Support vector data description
-      <http://dl.acm.org/citation.cfm?id=960109>`_, Machine Learning,
-      54(1):45-66, 2004.
-      
-.. topic:: Examples:
-
-   * See :ref:`example_svm_plot_oneclass.py` for visualizing the
-     frontier learned around some data by :class:`svm.OneClassSVM`.
-   * See :ref:`example_svm_plot_oneclass_vs_svdd.py` to get the idea about
-     the difference between the two approaches.
+Both methods can implicitly work in a transformed high-dimensional space using
+the kernel trick. :class:`svm.OneClassSVM` provides :math:`\nu` parameter for
+controlling the trade off between the margin and the number of outliers during
+training, namely it is an upper bound on the fraction of outliers in a training
+set or probability of finding a new, but regular, observation outside the
+frontier. :clss:`svm.SVDD` provides a similar parameter
+:math:`C = 1 / (\nu l)`, where :math:`l` is the number of samples, such that
+:math:`1/C` approximately equals the number of outliers in a training set.
+
+Both methods are equivalent if a) the kernel used depends only on the
+difference between two vectors, one example is RBF kernel, and
+b) :math:`C = 1 / (\nu l)`.
 
 .. figure:: ../auto_examples/svm/images/plot_oneclass_001.png
    :target: ../auto_examples/svm/plot_oneclasse.html
@@ -96,6 +83,22 @@ a training set.
    :align: center
    :scale: 75
 
+.. topic:: Examples:
+
+   * See :ref:`example_svm_plot_oneclass.py` for visualizing the
+     frontier learned around some data by :class:`svm.OneClassSVM`.
+   * See :ref:`example_svm_plot_oneclass_vs_svdd.py` to get the idea about
+     the difference between the two approaches.
+
+.. topic:: References:
+
+    * Bernhard Schölkopf et al, `Estimating the Support of a High-Dimensional
+      Distribution <http://dl.acm.org/citation.cfm?id=1119749>`_, Neural
+      computation 13.7 (2001): 1443-1471.
+    * David M. J. Tax and Robert P. W. Duin, `Support Vector Data Description
+      <http://dl.acm.org/citation.cfm?id=960109>`_, Machine Learning,
+      54(1):45-66, 2004.
+
 
 Outlier Detection
 =================
@@ -190,48 +193,73 @@ This strategy is illustrated below.
            Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on.
 
      
-Comparison of different approaches
-----------------------------------
+One-class SVM versus Elliptic Envelope versus Isolation Forest
+--------------------------------------------------------------
 
-Strictly-speaking, the SVM-based methods are not designed for outlier
-detection, but rather for novelty detection: its training set should not be
-contaminated by outliers as it may fit them. That said, outlier detection in
-high-dimension, or without any assumptions on the distribution of the inlying
-data is very challenging, and a SVM-based methods give useful results in these
-situations.
+Strictly-speaking, the One-class SVM is not an outlier-detection method,
+but a novelty-detection method: its training set should not be
+contaminated by outliers as it may fit them. That said, outlier detection
+in high-dimension, or without any assumptions on the distribution of the
+inlying data is very challenging, and a One-class SVM gives useful
+results in these situations.
 
 The examples below illustrate how the performance of the
-:class:`covariance.EllipticEnvelope` degrades as the data is less and less
-unimodal, and other methods become more beneficial. Note, that the parameters
-of :class:`svm.OneClassSVM` and :class:`svm.SVDD` are set to achieve their
-equivalence, i. e. :math:`C = 1 / (\nu l)`.
+:class:`covariance.EllipticEnvelope` degrades as the data is less and
+less unimodal. The :class:`svm.OneClassSVM` works better on data with
+multiple modes and :class:`ensemble.IsolationForest` performs well in all
+cases.
 
-|
+:class:`svm.SVDD` is not presented in comparison as it works the same as
+:class:`svm.OneClassSVM` when using RBF kernel.
 
-- For a inlier mode well-centered and elliptic all methods give approximately
-  equally good results.
-
-.. figure:: ../auto_examples/covariance/images/plot_outlier_detection_001.png
+.. |outlier1| image:: ../auto_examples/covariance/images/plot_outlier_detection_001.png
    :target: ../auto_examples/covariance/plot_outlier_detection.html
-   :align: center
-   :scale: 75%
+   :scale: 50%
 
-- As the inlier distribution becomes bimodal,
-  :class:`covariance.EllipticEnvelope` does not fit well the inliers. However,
-  we can see that other methods also have difficulties to detect the two modes,
-  but generally perform equally well.
+.. |outlier2| image:: ../auto_examples/covariance/images/plot_outlier_detection_002.png
+   :target: ../auto_examples/covariance/plot_outlier_detection.html
+   :scale: 50%
 
-.. figure:: ../auto_examples/covariance/images/plot_outlier_detection_002.png
+.. |outlier3| image:: ../auto_examples/covariance/images/plot_outlier_detection_003.png
    :target: ../auto_examples/covariance/plot_outlier_detection.html
-   :align: center
-   :scale: 75%
+   :scale: 50%
+
+.. list-table:: **Comparing One-class SVM approach, and elliptic envelope**
+   :widths: 40 60
+
+   *
+      - For a inlier mode well-centered and elliptic, the
+        :class:`svm.OneClassSVM` is not able to benefit from the
+        rotational symmetry of the inlier population. In addition, it
+        fits a bit the outliers present in the training set. On the
+        opposite, the decision rule based on fitting an
+        :class:`covariance.EllipticEnvelope` learns an ellipse, which
+        fits well the inlier distribution. The :class:`ensemble.IsolationForest`
+	performs as well.
+      - |outlier1|
+
+   *
+      - As the inlier distribution becomes bimodal, the
+        :class:`covariance.EllipticEnvelope` does not fit well the
+        inliers. However, we can see that both :class:`ensemble.IsolationForest`
+	and :class:`svm.OneClassSVM` have difficulties to detect the two modes,
+	and that the :class:`svm.OneClassSVM`
+        tends to overfit: because it has not model of inliers, it
+        interprets a region where, by chance some outliers are
+        clustered, as inliers.
+      - |outlier2|
+
+   *
+      - If the inlier distribution is strongly non Gaussian, the
+        :class:`svm.OneClassSVM` is able to recover a reasonable
+        approximation as well as :class:`ensemble.IsolationForest`,
+	whereas the :class:`covariance.EllipticEnvelope` completely fails.
+      - |outlier3|
 
-- As the inlier distribution gets strongly non-Gaussian,
-  :class:`covariance.EllipticEnvelope` starts to perform inadequate. Other
-  methods give a reasonable representation, with
-  :class:`ensemble.IsolationForest` having the least amount of errors.
+.. topic:: Examples:
 
-.. figure:: ../auto_examples/covariance/images/plot_outlier_detection_003.png
-   :target: ../auto_examples/covariance/plot_outlier_detection.html
-   :align: center
-   :scale: 75%
+   * See :ref:`example_covariance_plot_outlier_detection.py` for a
+     comparison of the :class:`svm.OneClassSVM` (tuned to perform like
+     an outlier detection method), the :class:`ensemble.IsolationForest`
+     and a covariance-based outlier
+     detection with :class:`covariance.MinCovDet`.
@@ -327,10 +327,10 @@ floating point values instead of integer values::
 
 .. _svm_outlier_detection:
 
-Outlier and novelty detection
+Novelty and outlier detection
 =============================
 
-Support vector machines can be used for detecting novely and outliers in
+Support vector machines can be used for detecting novelty and outliers in
 unlabeled data sets. That is, given a set of samples, detect the soft boundary
 of that set so as to classify new points as belonging to that set or not.
 
@@ -359,21 +359,22 @@ See section :ref:`outlier_detection` for more details on their usage.
 .. topic:: Examples:
 
    * See :ref:`example_svm_plot_oneclass.py` for visualizing the
-     frontier learned around some data by :class:`svm.OneClassSVM`.
+     frontier learned around some data by :class:`OneClassSVM`.
    * See :ref:`example_svm_plot_oneclass_vs_svdd.py` to get the idea about
      the difference between the two approaches.
    * :ref:`example_applications_plot_species_distribution_modeling.py`
 
 .. topic:: References:
 
-    * Bernhard Schölkopf et al, `Estimating the support of a high-dimensional
-      distribution <http://dl.acm.org/citation.cfm?id=1119749>`_, Neural
+    * Bernhard Schölkopf et al, `Estimating the Support of a High-Dimensional
+      Distribution <http://dl.acm.org/citation.cfm?id=1119749>`_, Neural
       computation 13.7 (2001): 1443-1471.
-    * David M. J. Tax and Robert P. W. Duin, `Support vector data description
+    * David M. J. Tax and Robert P. W. Duin, `Support Vector Data Description
       <http://dl.acm.org/citation.cfm?id=960109>`_, Machine Learning,
       54(1):45-66, 2004.
 
 
+
 Complexity
 ==========
 
 
@@ -1,7 +1,7 @@
 """
-======================================
-Outlier detection with several methods
-======================================
+==========================================
+Outlier detection with several methods.
+==========================================
 
 When the amount of contamination is known, this example illustrates three
 different ways of performing :ref:`outlier_detection`:
@@ -45,15 +45,14 @@
 outliers_fraction = 0.25
 clusters_separation = [0, 1, 2]
 
-nu = 1.25 * outliers_fraction
-C = 1 / (nu * n_samples)
-
 # define two outlier detection tools to be compared
 classifiers = {
-    "One-Class SVM": svm.OneClassSVM(nu=nu, kernel="rbf", gamma=0.1),
-    "SVDD": svm.SVDD(C=C, kernel='rbf', gamma=0.1),
-    "robust covariance estimator": EllipticEnvelope(contamination=.1),
-    "Isolation Forest": IsolationForest(max_samples=n_samples, random_state=rng)}
+    "One-Class SVM": svm.OneClassSVM(nu=0.95 * outliers_fraction + 0.05,
+                                     kernel="rbf", gamma=0.1),
+    "Robust Covariance Estimator": EllipticEnvelope(contamination=0.1),
+    "Isolation Forest": IsolationForest(max_samples=n_samples,
+                                        random_state=rng)
+}
 
 # Compare given classifiers under given settings
 xx, yy = np.meshgrid(np.linspace(-7, 7, 500), np.linspace(-7, 7, 500))
@@ -85,7 +84,7 @@
         # plot the levels lines and the points
         Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
         Z = Z.reshape(xx.shape)
-        subplot = plt.subplot(1, 4, i + 1)
+        subplot = plt.subplot(1, 3, i + 1)
         subplot.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),
                          cmap=plt.cm.Blues_r)
         a = subplot.contour(xx, yy, Z, levels=[threshold],
@@ -99,11 +98,10 @@
             [a.collections[0], b, c],
             ['Decision function', 'True inliers', 'True outliers'],
             prop=matplotlib.font_manager.FontProperties(size=11))
-        subplot.set_xlabel("%s\n(errors: %d)" % (clf_name, n_errors))
+        subplot.set_xlabel("%s (errors: %d)" % (clf_name, n_errors))
         subplot.set_xlim((-7, 7))
         subplot.set_ylim((-7, 7))
     plt.suptitle("Outlier detection")
-    plt.subplots_adjust(left=0.04, bottom=0.15, right=0.96, top=0.94,
-                        wspace=0.1, hspace=0.26)
+    plt.subplots_adjust(0.04, 0.1, 0.96, 0.94, 0.1, 0.26)
 
 plt.show()