8000 Sparse Ridge regression with intercept is incorrect · Issue #4710 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Sparse Ridge regression with intercept is incorrect #4710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomDLT opened this issue May 12, 2015 · 14 comments
Closed

Sparse Ridge regression with intercept is incorrect #4710

TomDLT opened this issue May 12, 2015 · 14 comments
Labels
Milestone

Comments

@TomDLT
Copy link
Member
TomDLT commented May 12, 2015

Ridge regression with fit_intercept=True does not give the same result if X is dense or sparse.
The call to _center_data in _BaseRidge.fit should probably be a call to sparse_center_data

test example :

import numpy as np
import scipy.sparse as sp
from sklearn.linear_model import Ridge
from sklearn.utils import safe_sqr


def get_pobj(w, intercept, myX, myy, alpha):
    w = w.ravel()
    p = np.sum(safe_sqr(myX.dot(w) + intercept - myy)) / 2.
    p += alpha * w.dot(w) / 2.
    return p


def test_ridge(X, y, alpha):
    for solver in ["cholesky", "lsqr", "sparse_cg"]:
        clf = Ridge(alpha=alpha, tol=1.0e-15, solver=solver,
                    fit_intercept=True)
        clf.fit(X, y)
        print get_pobj(clf.coef_, clf.intercept_, X, y, alpha)

alpha = 1.0
r = np.random.RandomState(42)
X = r.randn(100000, 2)
w = r.randn(2)
i = 10
y = np.dot(X, w) + i

print get_pobj(w, i, X, y, alpha)

print "----Dense----"
test_ridge(X, y, alpha)

print "----Sparse---"
X = sp.csr_matrix(X)
test_ridge(X, y, alpha)

returns

1.22411269359
----Dense----
1.22410049215
1.22410049215
1.22410049215
----Sparse---
5.52296274786
5.52296274786
5.52296274786

while with alpha = 0

0.0
----Dense----
4.86608640337e-23
1.41900299631e-26
1.81890174989e-22
----Sparse---
4.2989480748
4.2989480748
4.2989480748
@agramfort
Copy link
Member
agramfort commented May 12, 2015 via email

@mblondel
Copy link
Member

The way the intercept fitting is currently implemented cannot work for sparse data because the data sparsity would be lost when applying the centering.

The solution would be IMO to fit the intercept term directly in the solvers without centering the data before hand.

For the sparse_cg solver, it might be possible to center the data just in time when computing the matrix vector product here:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L39

and here:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L44

@amueller amueller added the Bug label May 12, 2015
@amueller amueller added this to the 0.17 milestone May 12, 2015
@mblondel
Copy link
Member

Updating all solvers is a lot of work. As a quick temporary fix, we could use an alternating minimization like this:

import numpy as np

from sklearn.linear_model import ridge_regression as _ridge_regression
from sklearn.utils.extmath import safe_sparse_dot


def ridge_regression(X, y, alpha, sample_weight=None, solver='auto',
                     fit_intercept=False, max_iter=None, tol=1e-3, verbose=0):

    if not fit_intercept:
        return _ridge_regression(X, y, alpha, sample_weight=sample_weight,
                                 solver=solver, max_iter=max_iter, tol=tol,
                                 verbose=verbose)


    intercept = 0
    for i in xrange(200):
        coef = _ridge_regression(X, y - intercept, alpha,
                                 sample_weight=sample_weight, solver=solver,
                                 max_iter=max_iter, tol=tol, verbose=verbose)

        y_pred = safe_sparse_dot(X, coef)
        intercept_old = intercept
        if sample_weight is not None:
            residual = sample_weight * (y - y_pred)
            intercept = np.sum(residual) / np.sum(sample_weight)
        else:
            residual = y - y_pred
            intercept = np.mean(residual)

        if abs(intercept - intercept_old) <= 1e-3:
            break

    return coef, intercept


if __name__ == '__main__':
    from sklearn.datasets import fetch_20newsgroups_vectorized

    bunch = fetch_20newsgroups_vectorized(subset="test")
    X = bunch.data
    y = bunch.target
    X = X[y <= 1]
    y = y[y <= 1]

    w, b = ridge_regression(X, y, alpha=1.0, fit_intercept=True,
                            solver="sparse_cg", max_iter=10)
    y_pred = safe_sparse_dot(X, w) + b
    print np.mean((y - y_pred) ** 2)

    w = ridge_regression(X, y, alpha=1.0, solver="sparse_cg", max_iter=10)
    y_pred = safe_sparse_dot(X, w)
    print np.mean((y - y_pred) ** 2)

Since the problem is jointly convex in coef and intercept, the procedure converges to the global minimum. An advantage is that this works with all solvers unmodified. There is just a bit of work to make sure that sample_weight and multi-output support work correctly.

@agramfort
Copy link
Member
agramfort commented May 14, 2015 via email

@mblondel
Copy link
Member

If we also add warm_start support to ridge_regression, I would expect the alternating minimization procedure to be pretty fast.

@amueller
Copy link
Member

I guess we keep this open for the better fix? But I'll untag 0.17 as #5360 is merged.

@amueller amueller removed this from the 0.17 milestone Oct 14, 2015
@amueller amueller modified the milestone: 0.19 Sep 29, 2016
@jnothman jnothman modified the milestones: 0.20, 0.19 Jun 14, 2017
@glemaitre glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018
@btel
Copy link
Contributor
btel commented Feb 27, 2019

as discussed with @GaelVaroquaux I will implement intercept fitting with sparse_cg

@jeromedockes
Copy link
Contributor

I'm not sure the issue is fixed when X does not have zero mean? note that make_regression, with bias=10, as used in the test, produces a Y with non-zero mean, but an X with zero mean.

see for example

#13325 (comment)

@jnothman
Copy link
Member

as discussed with @GaelVaroquaux I will implement intercept fitting with sparse_cg

@btel, what's the status of this?

@jnothman jnothman modified the milestones: 0.21, 0.22 Apr 15, 2019
@btel
Copy link
Contributor
btel commented Apr 16, 2019

@jnothman intercept with sparse_cg was implemented in PR #13336 (merged).

@jnothman
Copy link
Member

So this should be closed? This issue implies that there was incorrect behaviour not just that the use case wasn't supported

@btel
Copy link
Contributor
btel commented Apr 16, 2019

Yes, the behaviour is fixed. Here are results of the test from the top of the issue (by @TomDLT) when run locally against PR #13363

 
1.224112693590405
----Dense----
1.2241004921533805
1.2241004921533807
1.2241004921533807
----Sparse---
/home/bartosz/repos/libraries/scikit-learn/sklearn/linear_model/ridge.py:387: UserWarning: In Ridge, only 'sag' solver can directly fit the intercept. Solver has been automatically changed into 'sag'.
  warnings.warn("In Ridge, only 'sag' solver can directly fit the "
1.2241004921533807
/home/bartosz/repos/libraries/scikit-learn/sklearn/linear_model/ridge.py:387: UserWarning: In Ridge, only 'sag' solver can directly fit the intercept. Solver has been automatically changed into 'sag'.
  warnings.warn("In Ridge, only 'sag' solver can directly fit the "
1.2241004921533807
1.2241004921533807

with alpha=0

 
0.0
----Dense----
1.539028183040933e-26
9.320391595186254e-28
4.7141538835102444e-26
----Sparse---
/home/bartosz/repos/libraries/scikit-learn/sklearn/linear_model/ridge.py:387: UserWarning: In Ridge, only 'sag' solver can directly fit the intercept. Solver has been automatically changed into 'sag'.
  warnings.warn("In Ridge, only 'sag' solver can directly fit the "
1.4816503850996827e-25
/home/bartosz/repos/libraries/scikit-learn/sklearn/linear_model/ridge.py:387: UserWarning: In Ridge, only 'sag' solver can directly fit the intercept. Solver has been automatically changed into 'sag'.
  warnings.warn("In Ridge, only 'sag' solver can directly fit the "
7.156948451226682e-25
6.591011749212076e-26

Note that I ran the test against PR #13363 (which is not yet merged).

@TomDLT
Copy link
Member Author
TomDLT commented Apr 17, 2019

More precisely, it is fixed for "sparse_cg" (#13336) and "sag" solvers. It is not fixed for "cholesky" and "lsqr" solvers, and calling these solvers on sparse data with intercept=True results in using "sag" solver instead (with a warning, see #5360).

@jnothman
Copy link
Member

Right. I'd forgotten those details. Thanks for clarifying. Let's close!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants
0