-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
BUG unpenalized Ridge does not give minimum norm solution #22947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related: apparently
So this behavior in the |
I'm working on it :) |
I think for
For |
Note that the default solver used in the snippet above will fallback to "svd" because the problem is singular. I tried the snippet above with "lsqr" and unfortunately is does not converge to the minimum norm solution either. As a sanity check I implemented the following naive solver: class PInvLinearRegression(LinearRegression):
def __init__(self, fit_intercept=True):
self.fit_intercept = fit_intercept
def fit(self, X, y):
if self.fit_intercept:
X = np.hstack([X, np.ones((X.shape[0], 1))])
fitted_params = np.linalg.pinv(X.T @ X) @ X.T @ y
if self.fit_intercept:
self.coef_ = fitted_params[:-1]
self.intercept_ = fitted_params[-1]
else:
self.coef_ = fitted_params
self.intercept_ = 0
return self and I confirm that this approach recovers the expected minimal norm solution successfully, so the script in the description of this issue is correct. This pinv-based solver is not suitable for large I suspect that to recover the minimum norm solution for iterative solvers such as "lbfgs" or "lsqr", we need to do a dump init of the all the parameters (including the intercept) at zero. |
Actually, for scikit-learn/sklearn/linear_model/_base.py Lines 276 to 277 in 30bf6f3
scikit-learn/sklearn/linear_model/_ridge.py Lines 148 to 175 in 30bf6f3
If instead of mean centering class LsqrLinearRegression(LinearRegression):
def __init__(self, fit_intercept=True, tol=1e-5):
self.fit_intercept = fit_intercept
self.tol = tol
def fit(self, X, y):
if self.fit_intercept:
X = np.hstack([X, np.ones((X.shape[0], 1))])
result = sparse_linalg.lsqr(
X, y, damp=0, atol=self.tol, btol=self.tol, iter_lim=None
)
fitted_params = result[0]
self.n_iter_ = result[2]
if self.fit_intercept:
self.coef_ = fitted_params[:-1]
self.intercept_ = fitted_params[-1]
else:
self.coef_ = fitted_params
self.intercept_ = 0
return self So I think we should try to avoid doing the Still we would need to also empirically study the speed of convergence, both in terms of wall clock time and |
Haha, I re-read @lorentzenchr description of the issue:
So this is confirmed by my quick experiments. |
Describe the bug
As noted in #22910,
Ridge(alpha=0, fit_intercept=True)
does not give the minimal norm solution for wide data, i.e.n_features > n_samples
.Note that we nowhere guarantee that we provide the minimum norm solution.
Edit: Same seems to hold for
LinearRegression
, see #26164.Probable Cause
For wide X, the least squares problem reads a bit different:$\mathrm{min} ||w||_2$ subject to $Xw = y$ with solution $w = X'(XX')^{-1} y$ , see e.g. http://ee263.stanford.edu/lectures/min-norm.pdf.$w_0$ , this reads $w = X'(XX' + 1 1')^{-1} y$ , where 1 is a column vector of ones. $w_0 = 1'(XX' + 1 1')^{-1} y$ .
With explicit intercept
This is incompatible with our mean centering approach.
Example
This last statement should be be
False
. It proves that Ridge does not give the miminum norm solution.The text was updated successfully, but these errors were encountered: