scikit-learn · glemaitre · Jul 29, 2021 · Feb 25, 2021 · Jul 29, 2021 · Jul 29, 2021
diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst
@@ -50,7 +50,7 @@ and will store the coefficients :math:`w` of the linear model in its
 
 The coefficient estimates for Ordinary Least Squares rely on the
 independence of the features. When features are correlated and the
-columns of the design matrix :math:`X` have an approximate linear
+columns of the design matrix :math:`X` have an approximately linear
 dependence, the design matrix becomes close to singular
 and as a result, the least-squares estimate becomes highly sensitive
 to random errors in the observed target, producing a large
@@ -68,7 +68,7 @@ It is possible to constrain all the coefficients to be non-negative, which may
 be useful when they represent some physical or naturally non-negative
 quantities (e.g., frequency counts or prices of goods).
 :class:`LinearRegression` accepts a boolean ``positive``
-parameter: when set to `True` `Non Negative Least Squares
+parameter: when set to `True` `Non-Negative Least Squares
 <https://en.wikipedia.org/wiki/Non-negative_least_squares>`_ are then applied.
 
 .. topic:: Examples:
@@ -140,15 +140,15 @@ the output with the highest value.
 
 It might seem questionable to use a (penalized) Least Squares loss to fit a
 classification model instead of the more traditional logistic or hinge
-losses. However in practice all those models can lead to similar
+losses. However, in practice, all those models can lead to similar
 cross-validation scores in terms of accuracy or precision/recall, while the
 penalized least squares loss used by the :class:`RidgeClassifier` allows for
 a very different choice of the numerical solvers with distinct computational
 performance profiles.
 
 The :class:`RidgeClassifier` can be significantly faster than e.g.
-:class:`LogisticRegression` with a high number of classes, because it is
-able to compute the projection matrix :math:`(X^T X)^{-1} X^T` only once.
+:class:`LogisticRegression` with a high number of classes because it can
+compute the projection matrix :math:`(X^T X)^{-1} X^T` only once.
 
 This classifier is sometimes referred to as a `Least Squares Support Vector
 Machines
@@ -210,7 +210,7 @@ Lasso
 The :class:`Lasso` is a linear model that estimates sparse coefficients.
 It is useful in some contexts due to its tendency to prefer solutions
 with fewer non-zero coefficients, effectively reducing the number of
-features upon which the given solution is dependent. For this reason
+features upon which the given solution is dependent. For this reason,
 Lasso and its variants are fundamental to the field of compressed sensing.
 Under certain conditions, it can recover the exact set of non-zero
 coefficients (see
@@ -309,7 +309,7 @@ as the regularization path is computed only once instead of k+1 times
 when using k-fold cross-validation. However, such criteria needs a
 proper estimation of the degrees of freedom of the solution, are
 derived for large samples (asymptotic results) and assume the model
-is correct, i.e. that the data are actually generated by this model.
+is correct, i.e. that the data are generated by this model.
 They also tend to break when the problem is badly conditioned
 (more features than samples).
 
@@ -393,7 +393,7 @@ the regularization properties of :class:`Ridge`. We control the convex
 combination of :math:`\ell_1` and :math:`\ell_2` using the ``l1_ratio``
 parameter.
 
-Elastic-net is useful when there are multiple features which are
+Elastic-net is useful when there are multiple features that are
 correlated with one another. Lasso is likely to pick one of these
 at random, while elastic-net is likely to pick both.
 
@@ -500,7 +500,7 @@ The disadvantages of the LARS method include:
     in the discussion section of the Efron et al. (2004) Annals of
     Statistics article.
 
-The LARS model can be used using estimator :class:`Lars`, or its
+The LARS model can be used using via the estimator :class:`Lars`, or its
 low-level implementation :func:`lars_path` or :func:`lars_path_gram`.
 
 
@@ -546,7 +546,7 @@ the residual.
 Instead of giving a vector result, the LARS solution consists of a
 curve denoting the solution for each value of the :math:`\ell_1` norm of the
 parameter vector. The full coefficients path is stored in the array
-``coef_path_``, which has size (n_features, max_features+1). The first
+``coef_path_`` of shape `(n_features, max_features + 1)`. The first
 column is always zero.
 
 .. topic:: References: