8000 DOC Highlight differerence between SVC/R and LinearSVC/R (#26825) · scikit-learn/scikit-learn@636200d · GitHub
[go: up one dir, main page]

Skip to c 8000 ontent

Commit 636200d

Browse files
StefanieSengerjeremiedbb
authored andcommitted
DOC Highlight differerence between SVC/R and LinearSVC/R (#26825)
1 parent 7f8dd75 commit 636200d

File tree

3 files changed

+85
-47
lines changed

3 files changed

+85
-47
lines changed

doc/modules/svm.rst

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -60,14 +60,19 @@ capable of performing binary and multi-class classification on a dataset.
6060
:align: center
6161

6262

63-
:class:`SVC` and :class:`NuSVC` are similar methods, but accept
64-
slightly different sets of parameters and have different mathematical
65-
formulations (see section :ref:`svm_mathematical_formulation`). On the
66-
other hand, :class:`LinearSVC` is another (faster) implementation of Support
67-
Vector Classification for the case of a linear kernel. Note that
68-
:class:`LinearSVC` does not accept parameter ``kernel``, as this is
69-
assumed to be linear. It also lacks some of the attributes of
70-
:class:`SVC` and :class:`NuSVC`, like ``support_``.
63+
:class:`SVC` and :class:`NuSVC` are similar methods, but accept slightly
64+
different sets of parameters and have different mathematical formulations (see
65+
section :ref:`svm_mathematical_formulation`). On the other hand,
66+
:class:`LinearSVC` is another (faster) implementation of Support Vector
67+
Classification for the case of a linear kernel. It also
68+
lacks some of the attributes of :class:`SVC` and :class:`NuSVC`, like
69+
`support_`. :class:`LinearSVC` uses `squared_hinge` loss and due to its
70+
implementation in `liblinear` it also regularizes the intercept, if considered.
71+
This effect can however be reduced by carefully fine tuning its
72+
`intercept_scaling` parameter, which allows the intercept term to have a
73+
different regularization behavior compared to the other features. The
74+
classification results and score can therefore differ from the other two
75+
classifiers.
7176

7277
As other classifiers, :class:`SVC`, :class:`NuSVC` and
7378
:class:`LinearSVC` take as input two arrays: an array `X` of shape
@@ -314,10 +319,15 @@ target.
314319

315320
There are three different implementations of Support Vector Regression:
316321
:class:`SVR`, :class:`NuSVR` and :class:`LinearSVR`. :class:`LinearSVR`
317-
provides a faster implementation than :class:`SVR` but only considers
318-
the linear kernel, while :class:`NuSVR` implements a slightly different
319-
formulation than :class:`SVR` and :class:`LinearSVR`. See
320-
:ref:`svm_implementation_details` for further details.
322+
provides a faster implementation than :class:`SVR` but only considers the
323+
linear kernel, while :class:`NuSVR` implements a slightly different formulation
324+
than :class:`SVR` and :class:`LinearSVR`. Due to its implementation in
325+
`liblinear` :class:`LinearSVR` also regularizes the intercept, if considered.
326+
This effect can however be reduced by carefully fine tuning its
327+
`intercept_scaling` parameter, which allows the intercept term to have a
328+
different regularization behavior compared to the other features. The
329+
classification results and score can therefore differ from the other two
330+
classifiers. See :ref:`svm_implementation_details` for further details.
321331

322332
As with classification classes, the fit method will take as
323333
argument vectors X, y, only that in this case y is expected to have

sklearn/svm/_base.py

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -825,7 +825,7 @@ def predict(self, X):
825825
def _check_proba(self):
826826
if not self.probability:
827827
raise AttributeError(
828-
"predict_proba is not available when probability=False"
828+
"predict_proba is not available when probability=False"
829829
)
830830
if self._impl not in ("c_svc", "nu_svc"):
831831
raise AttributeError("predict_proba only implemented for SVC and NuSVC")
@@ -835,7 +835,7 @@ def _check_proba(self):
835835
def predict_proba(self, X):
836836
"""Compute probabilities of possible outcomes for samples in X.
837837
838-
The model need to have probability information computed at training
838+
The model needs to have probability information computed at training
839839
time: fit with attribute `probability` set to True.
840840
841841
Parameters
@@ -1095,18 +1095,26 @@ def _fit_liblinear(
10951095
Target vector relative to X
10961096
10971097
C : float
1098-
Inverse of cross-validation parameter. Lower the C, the more
1098+
Inverse of cross-validation parameter. The lower the C, the higher
10991099
the penalization.
11001100
11011101
fit_intercept : bool
1102-
Whether or not to fit the intercept, that is to add a intercept
1103-
term to the decision function.
1102+
Whether or not to fit an intercept. If set to True, the feature vector
1103+
is extended to include an intercept term: ``[x_1, ..., x_n, 1]``, where
1104+
1 corresponds to the intercept. If set to False, no intercept will be
1105+
used in calculations (i.e. data is expected to be already centered).
11041106
11051107
intercept_scaling : float
1106-
LibLinear internally penalizes the intercept and this term is subject
1107-
to regularization just like the other terms of the feature vector.
1108-
In order to avoid this, one should increase the intercept_scaling.
1109-
such that the feature vector becomes [x, intercept_scaling].
1108+
Liblinear internally penalizes the intercept, treating it like any
1109+
other term in the feature vector. To reduce the impact of the
1110+
regularization on the intercept, the `intercept_scaling` parameter can
1111+
be set to a value greater than 1; the higher the value of
1112+
`intercept_scaling`, the lower the impact of regularization on it.
1113+
Then, the weights become `[w_x_1, ..., w_x_n,
1114+
w_intercept*intercept_scaling]`, where `w_x_1, ..., w_x_n` represent
1115+
the feature weights and the intercept weight is scaled by
1116+
`intercept_scaling`. This scaling allows the intercept term to have a
1117+
different regularization behavior compared to the other features.
11101118
11111119
class_weight : dict or 'balanced', default=None
11121120
Weights associated with classes in the form ``{class_label: weight}``.

sklearn/svm/_classes.py

Lines changed: 46 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ class LinearSVC(LinearClassifierMixin, SparseCoefMixin, BaseEstimator):
4949
penalties and loss functions and should scale better to large numbers of
5050
samples.
5151
52+
The main differences between :class:`~sklearn.svm.LinearSVC` and
53+
:class:`~sklearn.svm.SVC` lie in the loss function used by default, and in
54+
the handling of intercept regularization between those two implementations.
55+
5256
This class supports both dense and sparse input and the multiclass support
5357
is handled according to a one-vs-the-rest scheme.
5458
@@ -99,20 +103,26 @@ class LinearSVC(LinearClassifierMixin, SparseCoefMixin, BaseEstimator):
99103
will be ignored.
100104
101105
fit_intercept : bool, default=True
102-
Whether to calculate the intercept for this model. If set
103-
to false, no intercept will be used in calculations
104-
(i.e. data is expected to be already centered).
106+
Whether or not to fit an intercept. If set to True, the feature vector
107+
is extended to include an intercept term: `[x_1, ..., x_n, 1]`, where
108+
1 corresponds to the intercept. If set to False, no intercept will be
109+
used in calculations (i.e. data is expected to be already centered).
105110
106111
intercept_scaling : float, default=1.0
107-
When self.fit_intercept is True, instance vector x becomes
108-
``[x, self.intercept_scaling]``,
109-
i.e. a "synthetic" feature with constant value equals to
110-
intercept_scaling is appended to the instance vector.
111-
The intercept becomes intercept_scaling * synthetic feature weight
112-
Note! the synthetic feature weight is subject to l1/l2 regularization
113-
as all other features.
114-
To lessen the effect of regularization on synthetic feature weight
115-
(and therefore on the intercept) intercept_scaling has to be increased.
112+
When `fit_intercept` is True, the instance vector x becomes ``[x_1,
113+
..., x_n, intercept_scaling]``, i.e. a "synthetic" feature with a
114+
constant value equal to `intercept_scaling` is appended to the instance
115+
vector. The intercept becomes intercept_scaling * synthetic feature
116+
weight. Note that liblinear internally penalizes the intercept,
117+
treating it like any other term in the feature vector. To reduce the
118+
impact of the regularization on the intercept, the `intercept_scaling`
119+
parameter can be set to a value greater than 1; the higher the value of
120+
`intercept_scaling`, the lower the impact of regularization on it.
121+
Then, the weights become `[w_x_1, ..., w_x_n,
122+
w_intercept*intercept_scaling]`, where `w_x_1, ..., w_x_n` represent
123+
the feature weights and the intercept weight is scaled by
124+
`intercept_scaling`. This scaling allows the intercept term to have a
125+
different regularization behavior compared to the other features.
116126
117127
class_weight : dict or 'balanced', default=None
118128
Set the parameter C of class i to ``class_weight[i]*C`` for
@@ -362,6 +372,10 @@ class LinearSVR(RegressorMixin, LinearModel):
362372
penalties and loss functions and should scale better to large numbers of
363373
samples.
364374
375+
The main differences between :class:`~sklearn.svm.LinearSVR` and
376+
:class:`~sklearn.svm.SVR` lie in the loss function used by default, and in
377+
the handling of intercept regularization between those two implementations.
378+
365379
This class supports both dense and sparse input.
366380
367381
Read more in the :ref:`User Guide <svm_regression>`.
@@ -389,20 +403,26 @@ class LinearSVR(RegressorMixin, LinearModel):
389403
loss ('squared_epsilon_insensitive') is the L2 loss.
390404
391405
fit_intercept : bool, default=True
392-
Whether to calculate the intercept for this model. If set
393-
to false, no intercept will be used in calculations
394-
(i.e. data is expected to be already centered).
406+
Whether or not to fit an intercept. If set to True, the feature vector
407+
is extended to include an intercept term: `[x_1, ..., x_n, 1]`, where
408+
1 corresponds to the intercept. If set to False, no intercept will be
409+
used in calculations (i.e. data is expected to be already centered).
395410
396411
intercept_scaling : float, default=1.0
397-
When self.fit_intercept is True, instance vector x becomes
398-
[x, self.intercept_scaling],
399-
i.e. a "synthetic" feature with constant value equals to
400-
intercept_scaling is appended to the instance vector.
401-
The intercept becomes intercept_scaling * synthetic feature weight
402-
Note! the synthetic feature weight is subject to l1/l2 regularization
403-
as all other features.
404-
To lessen the effect of regularization on synthetic feature weight
405-
(and therefore on the intercept) intercept_scaling has to be increased.
412+
When `fit_intercept` is True, the instance vector x becomes `[x_1, ...,
413+
x_n, intercept_scaling]`, i.e. a "synthetic" feature with a constant
414+
value equal to `intercept_scaling` is appended to the instance vector.
415+
The intercept becomes intercept_scaling * synthetic feature weight.
416+
Note that liblinear internally penalizes the intercept, treating it
417+
like any other term in the feature vector. To reduce the impact of the
418+
regularization on the intercept, the `intercept_scaling` parameter can
419+
be set to a value greater than 1; the higher the value of
420+
`intercept_scaling`, the lower the impact of regularization on it.
421+
Then, the weights become `[w_x_1, ..., w_x_n,
422+
w_intercept*intercept_scaling]`, where `w_x_1, ..., w_x_n` represent
423+
the feature weights and the intercept weight is scaled by
424+
`intercept_scaling`. This scaling allows the intercept term to have a
425+
different regularization behavior compared to the other features.
406426
407427
dual : "auto" or bool, default=True
408428
Select the algorithm to either solve the dual or primal
@@ -462,8 +482,8 @@ class LinearSVR(RegressorMixin, LinearModel):
462482
same library as this class (liblinear).
463483
464484
SVR : Implementation of Support Vector Machine regression using libsvm:
465-
the kernel can be non-linear but its SMO algorithm does not
466-
scale to large number of samples as LinearSVC does.
485+
the kernel can be non-linear but its SMO algorithm does not scale to
486+
large number of samples as :class:`~sklearn.svm.LinearSVR` does.
467487
468488
sklearn.linear_model.SGDRegressor : SGDRegressor can optimize the same cost
469489
function as LinearSVR

0 commit comments

Comments
 (0)
0