scikit-learn · thomasjpfan · Sep 23, 2019 · Sep 13, 2019 · Sep 13, 2019 · Sep 13, 2019
diff --git a/doc/glossary.rst b/doc/glossary.rst
@@ -1547,6 +1547,12 @@ functions or non-estimator constructors.
         picklable.  This means, for instance, that lambdas cannot be used
         as estimator parameters.
 
+    ``pos_label``
+        Value with which positive labels must be encoded in binary
+        classification problems in which the positive class is not assumed.
+        This value is typically required to compute asymmetric evaluation
+        metrics such as precision and recall.
+
     ``random_state``
         Whenever randomization is part of a Scikit-learn algorithm, a
         ``random_state`` parameter may be provided to control the random number

diff --git a/doc/modules/computing.rst b/doc/modules/computing.rst
@@ -565,7 +565,7 @@ These environment variables should be set before importing scikit-learn.
 
 :SKLEARN_WORKING_MEMORY:
 
-    Sets the default value for the :term:`working_memory` argument of
+    Sets the default value for the `working_memory` argument of
     :func:`sklearn.set_config`.
 
 :SKLEARN_SEED:

diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
@@ -456,7 +456,7 @@ trees.
   Scikit-learn 0.21 introduces two new experimental implementations of
   gradient boosting trees, namely :class:`HistGradientBoostingClassifier`
   and :class:`HistGradientBoostingRegressor`, inspired by
-  `LightGBM <https://github.com/Microsoft/LightGBM>`__.
+  `LightGBM <https://github.com/Microsoft/LightGBM>`__ (See [LightGBM]_).
 
   These histogram-based estimators can be **orders of magnitude faster**
   than :class:`GradientBoostingClassifier` and
@@ -825,7 +825,7 @@ Histogram-Based Gradient Boosting
 Scikit-learn 0.21 introduces two new experimental implementations of
 gradient boosting trees, namely :class:`HistGradientBoostingClassifier`
 and :class:`HistGradientBoostingRegressor`, inspired by
-`LightGBM <https://github.com/Microsoft/LightGBM>`__.
+`LightGBM <https://github.com/Microsoft/LightGBM>`__ (See [LightGBM]_).
 
 These histogram-based estimators can be **orders of magnitude faster**
 than :class:`GradientBoostingClassifier` and
@@ -996,10 +996,15 @@ Finally, many parts of the implementation of
 
 .. topic:: References
 
-  .. [XGBoost] Tianqi Chen, Carlos Guestrin, "XGBoost: A Scalable Tree
-     Boosting System". https://arxiv.org/abs/1603.02754
-  .. [LightGBM] Ke et. al. "LightGBM: A Highly Efficient Gradient
-     BoostingDecision Tree"
+  .. [F1999] Friedmann, Jerome H., 2007, `"Stochastic Gradient Boosting"
+     <https://statweb.stanford.edu/~jhf/ftp/stobst.pdf>`_
+  .. [R2007] G. Ridgeway, "Generalized Boosted Models: A guide to the gbm
+     package", 2007
+  .. [XGBoost] Tianqi Chen, Carlos Guestrin, `"XGBoost: A Scalable Tree
+     Boosting System" <https://arxiv.org/abs/1603.02754>`_
+  .. [LightGBM] Ke et. al. `"LightGBM: A Highly Efficient Gradient
+     BoostingDecision Tree" <https://papers.nips.cc/paper/
+     6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`_
 
 .. _voting_classifier:
 

diff --git a/doc/modules/neighbors.rst b/doc/modules/neighbors.rst
@@ -720,5 +720,5 @@ added space complexity in the operation.
       J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
       Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.
 
-    .. [2] `Wikipedia entry on Neighborhood Components Analysis
-      <https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>`_
+    `Wikipedia entry on Neighborhood Components Analysis
+    <https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>`_
diff --git a/doc/modules/partial_dependence.rst b/doc/modules/partial_dependence.rst
@@ -125,5 +125,5 @@ which the trees were trained.
     Statistical Learning <https://web.stanford.edu/~hastie/ElemStatLearn//>`_,
     Second Edition, Section 10.13.2, Springer, 2009.
 
- .. [Mol2019] C. Molnar, `Interpretable Machine Learning
+    C. Molnar, `Interpretable Machine Learning
     <https://christophm.github.io/interpretable-ml-book/>`_, Section 5.1, 2019.
diff --git a/examples/decomposition/plot_faces_decomposition.py b/examples/decomposition/plot_faces_decomposition.py
@@ -3,7 +3,7 @@
 Faces dataset decompositions
 ============================
 
-This example applies to :ref:`olivetti_faces` different unsupervised
+This example applies to :ref:`olivetti_faces_dataset` different unsupervised
 matrix decomposition (dimension reduction) methods from the module
 :py:mod:`sklearn.decomposition` (see the documentation chapter
 :ref:`decompositions`) .

diff --git a/examples/inspection/plot_permutation_importance.py b/examples/inspection/plot_permutation_importance.py
@@ -20,7 +20,7 @@
 
 .. topic:: References:
 
-   .. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
+   [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
        2001. https://doi.org/10.1023/A:1010933404324
 """
 print(__doc__)

diff --git a/examples/multioutput/plot_classifier_chain_yeast.py b/examples/multioutput/plot_classifier_chain_yeast.py
@@ -10,7 +10,7 @@
 data point has at least one label. As a baseline we first train a logistic
 regression classifier for each of the 14 labels. To evaluate the performance of
 these classifiers we predict on a held-out test set and calculate the
-:ref:`jaccard score <jaccard_score>` for each sample.
+:ref:`jaccard score <jaccard_similarity_score>` for each sample.
 
 Next we create 10 classifier chains. Each classifier chain contains a
 logistic regression model for each of the 14 labels. The models in each

diff --git a/sklearn/decomposition/online_lda.py b/sklearn/decomposition/online_lda.py
@@ -274,8 +274,8 @@ class LatentDirichletAllocation(TransformerMixin, BaseEstimator):
 
     References
     ----------
-    [1] "Online Learning for Latent Dirichlet Allocation", Matthew D. Hoffman,
-        David M. Blei, Francis Bach, 2010
+    .. [1] "Online Learning for Latent Dirichlet Allocation", Matthew D.
+        Hoffman, David M. Blei, Francis Bach, 2010
 
     [2] "Stochastic Variational Inference", Matthew D. Hoffman, David M. Blei,
         Chong Wang, John Paisley, 2013

diff --git a/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py b/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
@@ -751,13 +751,13 @@ class HistGradientBoostingRegressor(RegressorMixin, BaseHistGradientBoosting):
     n_trees_per_iteration_ : int
         The number of tree that are built at each iteration. For regressors,
         this is always 1.
-    train_score_ : ndarray, shape (n_iter_ + 1,)
+    train_score_ : ndarray, shape (n_iter_+1,)
         The scores at each iteration on the training data. The first entry
         is the score of the ensemble before the first iteration. Scores are
         computed according to the ``scoring`` parameter. If ``scoring`` is
         not 'loss', scores are computed on a subset of at most 10 000
         samples. Empty if no early stopping.
-    validation_score_ : ndarray, shape (n_iter_ + 1,)
+    validation_score_ : ndarray, shape (n_iter_+1,)
         The scores at each iteration on the held-out validation data. The
         first entry is the score of the ensemble before the first iteration.
         Scores are computed according to the ``scoring`` parameter. Empty if
@@ -932,13 +932,13 @@ class HistGradientBoostingClassifier(BaseHistGradientBoosting,
         The number of tree that are built at each iteration. This is equal to 1
         for binary classification, and to ``n_classes`` for multiclass
         classification.
-    train_score_ : ndarray, shape (n_iter_ + 1,)
+    train_score_ : ndarray, shape (n_iter_+1,)
         The scores at each iteration on the training data. The first entry
         is the score of the ensemble before the first iteration. Scores are
         computed according to the ``scoring`` parameter. If ``scoring`` is
         not 'loss', scores are computed on a subset of at most 10 000
         samples. Empty if no early stopping.
-    validation_score_ : ndarray, shape (n_iter_ + 1,)
+    validation_score_ : ndarray, shape (n_iter_+1,)
         The scores at each iteration on the held-out validation data. The
         first entry is the score of the ensemble before the first iteration.
         Scores are computed according to the ``scoring`` parameter. Empty if

diff --git a/sklearn/ensemble/partial_dependence.py b/sklearn/ensemble/partial_dependence.py
@@ -261,7 +261,7 @@ def plot_partial_dependence(gbrt, X, features, feature_names=None,
         Dict with keywords passed to the ``matplotlib.pyplot.plot`` call.
         For two-way partial dependence plots.
 
-    **fig_kw : dict
+    ``**fig_kw`` : dict
         Dict with keywords passed to the figure() call.
         Note that all keywords not recognized above will be automatically
         included here.

diff --git a/sklearn/impute/_knn.py b/sklearn/impute/_knn.py
@@ -49,7 +49,7 @@ class KNNImputer(TransformerMixin, BaseEstimator):
 
         - 'nan_euclidean'
         - callable : a user-defined function which conforms to the definition
-          of _pairwise_callable(X, Y, metric, **kwds). The function
+          of ``_pairwise_callable(X, Y, metric, **kwds)``. The function
           accepts two arrays, X and Y, and a `missing_values` keyword in
           `kwds` and returns a scalar distance value.
 

diff --git a/sklearn/linear_model/bayes.py b/sklearn/linear_model/bayes.py
@@ -108,7 +108,7 @@ class BayesianRidge(RegressorMixin, LinearModel):
     sigma_ : array-like of shape (n_features, n_features)
         Estimated variance-covariance matrix of the weights
 
-    scores_ : array-like of shape (n_iter_ + 1,)
+    scores_ : array-like of shape (n_iter_+1,)
         If computed_score is True, value of the log marginal likelihood (to be
         maximized) at each iteration of the optimization. The array starts
         with the value of the log marginal likelihood obtained for the initial

diff --git a/sklearn/metrics/ranking.py b/sklearn/metrics/ranking.py
@@ -1188,7 +1188,7 @@ def dcg_score(y_true, y_score, k=None,
     References
     ----------
     `Wikipedia entry for Discounted Cumulative Gain
-        <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_
+    <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_
 
     Jarvelin, K., & Kekalainen, J. (2002).
     Cumulated gain-based evaluation of IR techniques. ACM Transactions on
@@ -1336,7 +1336,7 @@ def ndcg_score(y_true, y_score, k=None, sample_weight=None, ignore_ties=False):
     References
     ----------
     `Wikipedia entry for Discounted Cumulative Gain
-        <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_
+    <https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_
 
     Jarvelin, K., & Kekalainen, J. (2002).
     Cumulated gain-based evaluation of IR techniques. ACM Transactions on

diff --git a/sklearn/svm/classes.py b/sklearn/svm/classes.py
@@ -827,11 +827,14 @@ class NuSVC(BaseSVC):
         Scalable linear Support Vector Machine for classification using
         liblinear.
 
-    Notes
-    -----
-    **References:**
-    `LIBSVM: A Library for Support Vector Machines
-    <http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf>`__
+    References
+    ----------
+    .. [1] `LIBSVM: A Library for Support Vector Machines
+        <http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf>`_
+
+    .. [2] `Platt, John (1999). "Probabilistic outputs for support vector
+        machines and comparison to regularizedlikelihood methods."
+        <http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1639>`_
     """
 
     _impl = 'nu_svc'