-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Sparse Ridge regression with intercept is incorrect #4710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
ping @mblondel @mei
8000
ckenberg
|
The way the intercept fitting is currently implemented cannot work for sparse data because the data sparsity would be lost when applying the centering. The solution would be IMO to fit the intercept term directly in the solvers without centering the data before hand. For the sparse_cg solver, it might be possible to center the data just in time when computing the matrix vector product here: and here: |
Updating all solvers is a lot of work. As a quick temporary fix, we could use an alternating minimization like this: import numpy as np
from sklearn.linear_model import ridge_regression as _ridge_regression
from sklearn.utils.extmath import safe_sparse_dot
def ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
fit_intercept=False, max_iter=None, tol=1e-3, verbose=0):
if not fit_intercept:
return _ridge_regression(X, y, alpha, sample_weight=sample_weight,
solver=solver, max_iter=max_iter, tol=tol,
verbose=verbose)
intercept = 0
for i in xrange(200):
coef = _ridge_regression(X, y - intercept, alpha,
sample_weight=sample_weight, solver=solver,
max_iter=max_iter, tol=tol, verbose=verbose)
y_pred = safe_sparse_dot(X, coef)
intercept_old = intercept
if sample_weight is not None:
residual = sample_weight * (y - y_pred)
intercept = np.sum(residual) / np.sum(sample_weight)
else:
residual = y - y_pred
intercept = np.mean(residual)
if abs(intercept - intercept_old) <= 1e-3:
break
return coef, intercept
if __name__ == '__main__':
from sklearn.datasets import fetch_20newsgroups_vectorized
bunch = fetch_20newsgroups_vectorized(subset="test")
X = bunch.data
y = bunch.target
X = X[y <= 1]
y = y[y <= 1]
w, b = ridge_regression(X, y, alpha=1.0, fit_intercept=True,
solver="sparse_cg", max_iter=10)
y_pred = safe_sparse_dot(X, w) + b
print np.mean((y - y_pred) ** 2)
w = ridge_regression(X, y, alpha=1.0, solver="sparse_cg", max_iter=10)
y_pred = safe_sparse_dot(X, w)
print np.mean((y - y_pred) ** 2) Since the problem is jointly convex in coef and intercept, the procedure converges to the global minimum. An advantage is that this works with all solvers unmodified. There is just a bit of work to make sure that sample_weight and multi-output support work correctly. |
that's a tmp fix
what is the overhead in practice for sparse data? for a CG solver for a
solver for example
|
If we also add warm_start support to |
I guess we keep this open for the better fix? But I'll untag 0.17 as #5360 is merged. |
as discussed with @GaelVaroquaux I will implement intercept fitting with |
I'm not sure the issue is fixed when X does not have zero mean? note that see for example |
@btel, what's the status of this? |
So this should be closed? This issue implies that there was incorrect behaviour not just that the use case wasn't supported |
Yes, the behaviour is fixed. Here are results of the test from the top of the issue (by @TomDLT) when run locally against PR #13363
with alpha=0
Note that I ran the test against PR #13363 (which is not yet merged). |
Right. I'd forgotten those details. Thanks for clarifying. Let's close! |
Ridge regression with
fit_intercept=True
does not give the same result if X is dense or sparse.The call to
_center_data
in_BaseRidge.fit
should probably be a call tosparse_center_data
test example :
returns
while with
alpha = 0
The text was updated successfully, but these errors were encountered: