DOC Extended documentation of Matern kernel

Jan Hendrik Metzen · Jan Hendrik Metzen · commit f35506d75dae · 2014-12-14T12:32:08.000+01:00
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -861,6 +861,7 @@ See the :ref:`metrics` section of the user guide for further details.
    metrics.pairwise.pairwise_kernels
    metrics.pairwise.polynomial_kernel
    metrics.pairwise.rbf_kernel
+   metrics.pairwise.matern_kernel
    metrics.pairwise_distances
    metrics.pairwise_distances_argmin
    metrics.pairwise_distances_argmin_min
diff --git a/doc/modules/metrics.rst b/doc/modules/metrics.rst
@@ -125,38 +125,59 @@ the kernel is known as the Gaussian kernel of variance :math:`\sigma^2`.
 
 Matérn kernel
 -------------
-The function :func:`matern_kernel` is a generalization of the RBF kernel. It
-has an additional parameter :math:`\nu` which controls the smoothness of the
-resulting function. The general functional form of a Matérn is given by:
+The function :func:`matern_kernel` is a generalization of the RBF kernel. It has
+an additional parameter :math:`\nu` (set via the keyword coef0) which controls
+the smoothness of the resulting function. The general functional form of a
+Matérn is given by
 
 .. math::
 
-    k(d) = \sigma^2\frac{1}{\Gamma(\nu)2^{\nu-1}}\Bigg(\sqrt{2\nu}\frac{d}{\rho}\Bigg)^\nu K_\nu\Bigg(\sqrt{2\nu}\frac{d}{\rho}\Bigg),
+    k(d) = \sigma^2\frac{1}{\Gamma(\nu)2^{\nu-1}}\Bigg(\gamma\sqrt{2\nu} d\Bigg)^\nu K_\nu\Bigg(\gamma\sqrt{2\nu} d\Bigg),
 
-where :math:`d=\| x-y \|^2` and ``x`` and ``y`` are the input vectors. 
+where :math:`d=\| x-y \|` and ``x`` and ``y`` are the input vectors. 
 
 As :math:`\nu\rightarrow\infty`, the Matérn kernel converges to the RBF kernel.
 When :math:`\nu = 1/2`, the Matérn kernel becomes identical to the absolute
 exponential kernel, i.e.,
 
 .. math::
-    k(d) = \sigma^2 \exp \Bigg(-\frac{d}{\rho} \Bigg) \quad \quad \nu= \tfrac{1}{2}
+    k(d) = \sigma^2 \exp \Bigg(-\gamma d \Bigg) \quad \quad \nu= \tfrac{1}{2}
 
-See Rasmussen and Williams 2006, pp84 for further details regarding the
-different variants of the Matérn kernel. In particular, :math:`\nu = 3/2`:
+In particular, :math:`\nu = 3/2`:
 
 .. math::
-    k(d) = \sigma^2 \Bigg(1 + \frac{ \sqrt{3}d }{\rho} \Bigg) \exp \Bigg(-\frac{\sqrt{3}d}{\rho} \Bigg) \quad \quad \nu= \tfrac{3}{2}
+    k(d) = \sigma^2 \Bigg(1 + \gamma \sqrt{3} d \Bigg) \exp \Bigg(-\gamma \sqrt{3}d \Bigg) \quad \quad \nu= \tfrac{3}{2}
 
 and :math:`\nu = 5/2`:
 
 .. math::
-    k(d) = \sigma^2 \Bigg(1 + \frac{ \sqrt{5}d }{\rho} +\frac{ 5d^2}{3 \rho^2 }   \Bigg) \exp \Bigg(-\frac{\sqrt{5}d}{\rho} \Bigg) \quad \quad \nu= \tfrac{5}{2}.
+    k(d) = \sigma^2 \Bigg(1 + \gamma \sqrt{5}d +\frac{5}{3} \gamma^2d^2 \Bigg) \exp \Bigg(-\gamma \sqrt{5}d \Bigg) \quad \quad \nu= \tfrac{5}{2}
 
 are popular choices for learning functions that are not infinitely
 differentiable (as assumed by the RBF kernel) but at least once (:math:`\nu =
 3/2`) or twice differentiable (:math:`\nu = 5/2`).
 
+The following example illustrates how the Matérn kernel's covariance decreases
+with increasing dissimilarity of the two inputs for different values of coef0
+(the parameter :math:`\nu` of the Matérn kernel):
+
+.. figure:: ../auto_examples/metrics/images/plot_matern_kernel_001.png
+    :target: ../auto_examples/metrics/plot_matern_kernel.html
+    :align: center
+
+The flexibility of controlling the smoothness of the learned function via coef0
+allows adapting to the properties of the true underlying functional relation.
+The following example shows that support vector regression with Matérn kernel
+with smaller values of coef0 can better approximate a discontinuous 
+step-function:
+
+.. figure:: ../auto_examples/svm/images/plot_svm_matern_kernel_001.png
+    :target: ../auto_examples/svm/plot_svm_matern_kernel.html
+    :align: center
+
+See Rasmussen and Williams 2006, pp84 for further details regarding the
+different variants of the Matérn kernel.
+
 
 Chi-squared kernel
 ------------------
@@ -207,3 +228,8 @@ The chi squared kernel is most commonly used on histograms (bags) of visual word
       International Journal of Computer Vision 2007
       http://eprints.pascal-network.org/archive/00002309/01/Zhang06-IJCV.pdf
 
+    * Rasmussen, C. E. and Williams, C.
+      Gaussian Processes for Machine Learning
+      The MIT Press, 2006
+      http://www.gaussianprocess.org/gpml/chapters/
+
diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -58,6 +58,10 @@ New features
    - Added :func:`metrics.median_absolute_error`, a robust metric.
      By `Gael Varoquaux`_ and `Florian Wilhelm`_.
 
+   - Added :func:`metrics.pairwise.matern_kernel`, a kernel where the 
+     smoothness of the learned function can be controlled.
+     By `Jan Hendrik Metzen`_.
+
 
 Enhancements
 ............
@@ -3110,3 +3114,5 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 .. _Matt Terry: https://github.com/mrterry
 
 .. _Antony Lee: https://www.ocf.berkeley.edu/~antonyl/
+
+.. _Jan Hendrik Metzen: https://jmetzen.github.io/
diff --git a/sklearn/metrics/pairwise.py b/sklearn/metrics/pairwise.py
@@ -774,7 +774,7 @@ def matern_kernel(X, Y=None, gamma=None, coef0=1.5):
 
     gamma : float
 
-    coef0 : float in [0.5, 1.5, 2.5, inf]
+    coef0 : float>0.0 (the parameter nu)
 
     Returns
     -------