@@ -50,7 +50,7 @@ and will store the coefficients :math:`w` of the linear model in its
50
50
51
51
The coefficient estimates for Ordinary Least Squares rely on the
52
52
independence of the features. When features are correlated and the
53
- columns of the design matrix :math: `X` have an approximate linear
53
+ columns of the design matrix :math: `X` have an approximately linear
54
54
dependence, the design matrix becomes close to singular
55
55
and as a result, the least-squares estimate becomes highly sensitive
56
56
to random errors in the observed target, producing a large
@@ -68,7 +68,7 @@ It is possible to constrain all the coefficients to be non-negative, which may
68
68
be useful when they represent some physical or naturally non-negative
69
69
quantities (e.g., frequency counts or prices of goods).
70
70
:class: `LinearRegression ` accepts a boolean ``positive ``
71
- parameter: when set to `True ` `Non Negative Least Squares
71
+ parameter: when set to `True ` `Non- Negative Least Squares
72
72
<https://en.wikipedia.org/wiki/Non-negative_least_squares> `_ are then applied.
73
73
74
74
.. topic :: Examples:
@@ -140,15 +140,15 @@ the output with the highest value.
140
140
141
141
It might seem questionable to use a (penalized) Least Squares loss to fit a
142
142
classification model instead of the more traditional logistic or hinge
143
- losses. However in practice all those models can lead to similar
143
+ losses. However, in practice, all those models can lead to similar
144
144
cross-validation scores in terms of accuracy or precision/recall, while the
145
145
penalized least squares loss used by the :class: `RidgeClassifier ` allows for
146
146
a very different choice of the numerical solvers with distinct computational
147
147
performance profiles.
148
148
149
149
The :class: `RidgeClass
8000
ifier ` can be significantly faster than e.g.
150
- :class: `LogisticRegression ` with a high number of classes, because it is
151
- able to compute the projection matrix :math: `(X^T X)^{-1 } X^T` only once.
150
+ :class: `LogisticRegression ` with a high number of classes because it can
151
+ compute the projection matrix :math: `(X^T X)^{-1 } X^T` only once.
152
152
153
153
This classifier is sometimes referred to as a `Least Squares Support Vector
154
154
Machines
@@ -210,7 +210,7 @@ Lasso
210
210
The :class: `Lasso ` is a linear model that estimates sparse coefficients.
211
211
It is useful in some contexts due to its tendency to prefer solutions
212
212
with fewer non-zero coefficients, effectively reducing the number of
213
- features upon which the given solution is dependent. For this reason
213
+ features upon which the given solution is dependent. For this reason,
214
214
Lasso and its variants are fundamental to the field of compressed sensing.
215
215
Under certain conditions, it can recover the exact set of non-zero
216
216
coefficients (see
@@ -309,7 +309,7 @@ as the regularization path is computed only once instead of k+1 times
309
309
when using k-fold cross-validation. However, such criteria needs a
310
310
proper estimation of the degrees of freedom of the solution, are
311
311
derived for large samples (asymptotic results) and assume the model
312
- is correct, i.e. that the data are actually generated by this model.
312
+ is correct, i.e. that the data are generated by this model.
313
313
They also tend to break when the problem is badly conditioned
314
314
(more features than samples).
315
315
@@ -393,7 +393,7 @@ the regularization properties of :class:`Ridge`. We control the convex
393
393
combination of :math: `\ell _1 ` and :math: `\ell _2 ` using the ``l1_ratio ``
394
394
parameter.
395
395
396
- Elastic-net is useful when there are multiple features which are
396
+ Elastic-net is useful when there are multiple features that are
397
397
correlated with one another. Lasso is likely to pick one of these
398
398
at random, while elastic-net is likely to pick both.
399
399
@@ -500,7 +500,7 @@ The disadvantages of the LARS method include:
500
500
in the discussion section of the Efron et al. (2004) Annals of
501
501
Statistics article.
502
502
503
- The LARS model can be used using estimator :class: `Lars `, or its
503
+ The LARS model can be used using via the estimator :class: `Lars `, or its
504
504
low-level implementation :func: `lars_path ` or :func: `lars_path_gram `.
505
505
506
506
@@ -546,7 +546,7 @@ the residual.
546
546
Instead of giving a vector result, the LARS solution consists of a
547
547
curve denoting the solution for each value of the :math: `\ell _1 ` norm of the
548
548
parameter vector. The full coefficients path is stored in the array
549
- ``coef_path_ ``, which has size (n_features, max_features+1) . The first
549
+ ``coef_path_ `` of shape ` (n_features, max_features + 1) ` . The first
550
550
column is always zero.
551
551
552
552
.. topic :: References:
0 commit comments