scikit-learn
diff --git a/‎doc/modules/clustering.rst
Lines changed: 24 additions & 22 deletions b/‎doc/modules/clustering.rst
Lines changed: 24 additions & 22 deletions
@@ -4,7 +4,7 @@
 Clustering
 ==========
 
-`Clustering <http://en.wikipedia.org/wiki/Cluster_analysis>`__ of
+`Clustering <https://en.wikipedia.org/wiki/Cluster_analysis>`_ of
 unlabeled data can be performed with the module :mod:`sklearn.cluster`.
 
 Each clustering algorithm comes in two variants: a class, that implements
@@ -152,7 +152,7 @@ It suffers from various drawbacks:
   better and zero is optimal. But in very high-dimensional spaces, Euclidean
   distances tend to become inflated
   (this is an instance of the so-called "curse of dimensionality").
-  Running a dimensionality reduction algorithm such as `PCA <PCA>`
+  Running a dimensionality reduction algorithm such as `PCA <PCA>`_
   prior to k-means clustering can alleviate this problem
   and speed up the computations.
 
@@ -208,8 +208,8 @@ each job).
 
 .. warning::
 
-    The parallel version of K-Means is broken on OS X when numpy uses the
-    Accelerate Framework. This is expected behavior: Accelerate can be called
+    The parallel version of K-Means is broken on OS X when `numpy` uses the
+    `Accelerate` Framework. This is expected behavior: `Accelerate` can be called
     after a fork but you need to execv the subprocess with the Python binary
     (which multiprocessing does not do under posix).
 
@@ -323,6 +323,7 @@ appropriate for small to medium sized datasets.
  * :ref:`example_applications_plot_stock_market.py` Affinity Propagation on
    Financial time series to find groups of companies
 
+
 **Algorithm description:**
 The messages sent between points belong to one of two categories. The first is
 the responsibility :math:`r(i, k)`,
@@ -361,9 +362,8 @@ Mean Shift
 :class:`MeanShift` clustering aims to discover *blobs* in a smooth density of
 samples. It is a centroid based algorithm, which works by updating candidates
 for centroids to be the mean of the points within a given region. These
-candidates are then filtered in a
-post-processing stage to eliminate near-duplicates to form the final set of
-centroids.
+candidates are then filtered in a post-processing stage to eliminate
+near-duplicates to form the final set of centroids.
 
 Given a candidate centroid :math:`x_i` for iteration :math:`t`, the candidate
 is updated according to the following equation:
@@ -373,11 +373,10 @@ is updated according to the following equation:
     x_i^{t+1} = x_i^t + m(x_i^t)
 
 Where :math:`N(x_i)` is the neighborhood of samples within a given distance
-around :math:`x_i` and :math:`m` is the *mean shift* vector that is computed
-for each centroid that
-points towards a region of the maximum increase in the density of points. This
-is computed using the following equation, effectively updating a centroid to be
-the mean of the samples within its neighborhood:
+around :math:`x_i` and :math:`m` is the *mean shift* vector that is computed for each
+centroid that points towards a region of the maximum increase in the density of points.
+This is computed using the following equation, effectively updating a centroid
+to be the mean of the samples within its neighborhood:
 
 .. math::
 
@@ -412,7 +411,7 @@ given sample.
 
  * `"Mean shift: A robust approach toward feature space analysis."
    <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.8968&rep=rep1&type=pdf>`_
-   D. Comaniciu, & P. Meer *IEEE Transactions on Pattern Analysis and Machine Intelligence* (2002)
+   D. Comaniciu and P. Meer, *IEEE Transactions on Pattern Analysis and Machine Intelligence* (2002)
 
 
 .. _spectral_clustering:
@@ -524,7 +523,7 @@ build nested clusters by merging or splitting them successively. This
 hierarchy of clusters is represented as a tree (or dendrogram). The root of the
 tree is the unique cluster that gathers all the samples, the leaves being the
 clusters with only one sample. See the `Wikipedia page
-<http://en.wikipedia.org/wiki/Hierarchical_clustering>`_ for more details.
+<https://en.wikipedia.org/wiki/Hierarchical_clustering>`_ for more details.
 
 The :class:`AgglomerativeClustering` object performs a hierarchical clustering
 using a bottom up approach: each observation starts in its own cluster, and
@@ -1003,7 +1002,7 @@ random labelings by defining the adjusted Rand index as follows:
    L. Hubert and P. Arabie, Journal of Classification 1985
 
  * `Wikipedia entry for the adjusted Rand index
-   <http://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index>`_
+   <https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index>`_
 
 .. _mutual_info_score:
 
@@ -1153,23 +1152,25 @@ calculated using a similar form to that of the adjusted Rand index:
 
  * Strehl, Alexander, and Joydeep Ghosh (2002). "Cluster ensembles – a
    knowledge reuse framework for combining multiple partitions". Journal of
-   Machine Learning Research 3: 583–617. doi:10.1162/153244303321897735
+   Machine Learning Research 3: 583–617.
+   `doi:10.1162/153244303321897735 <http://strehl.com/download/strehl-jmlr02.pdf>`_.
 
  * Vinh, Epps, and Bailey, (2009). "Information theoretic measures
    for clusterings comparison". Proceedings of the 26th Annual International
    Conference on Machine Learning - ICML '09.
-   doi:10.1145/1553374.1553511. ISBN 9781605585161.
+   `doi:10.1145/1553374.1553511 <http://dx.doi.org/10.1145/1553374.1553511>`_.
+   ISBN 9781605585161.
 
  * Vinh, Epps, and Bailey, (2010). Information Theoretic Measures for
    Clusterings Comparison: Variants, Properties, Normalization and
-   Correction for Chance}, JMLR
+   Correction for Chance, JMLR
    http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf
 
  * `Wikipedia entry for the (normalized) Mutual Information
-   <http://en.wikipedia.org/wiki/Mutual_Information>`_
+   <https://en.wikipedia.org/wiki/Mutual_Information>`_
 
  * `Wikipedia entry for the Adjusted Mutual Information
-   <http://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_
+   <https://en.wikipedia.org/wiki/Adjusted_Mutual_Information>`_
 
 .. _homogeneity_completeness:
 
@@ -1240,7 +1241,7 @@ homogeneous but not complete::
 Advantages
 ~~~~~~~~~~
 
-- **Bounded scores**: 0.0 is as bad as it can be, 1.0 is a perfect score
+- **Bounded scores**: 0.0 is as bad as it can be, 1.0 is a perfect score.
 
 - Intuitive interpretation: clustering with bad V-measure can be
   **qualitatively analyzed in terms of homogeneity and completeness**
@@ -1375,7 +1376,8 @@ cluster analysis.
 
  * Peter J. Rousseeuw (1987). "Silhouettes: a Graphical Aid to the
    Interpretation and Validation of Cluster Analysis". Computational
-   and Applied Mathematics 20: 53–65. doi:10.1016/0377-0427(87)90125-7.
+   and Applied Mathematics 20: 53–65.
+   `doi:10.1016/0377-0427(87)90125-7 <http://dx.doi.org/10.1016/0377-0427(87)90125-7>`_.
 
 
 Advantages