8000 Working feature is falsely suppressed [Lasso & ElasticNet for sparse matrices with weights] · Issue #21700 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Working feature is falsely suppressed [Lasso & ElasticNet for sparse matrices with weights] #21700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
draphi opened this issue Nov 17, 2021 · 9 comments

Comments

@draphi
Copy link
draphi commented Nov 17, 2021

Describe the bug

Please remove the checks to allow sparse matrices in Lasso and ElasticNet since it actually is already supported.

             raise ValueError(
                    "Sample weights do not (yet) support sparse matrices."
                )

Relevant files are:
ElasticNet.fit(...):
l.956 sklearn/linear_model/_coordinate_descent.py

See
https://github.com/scikit-learn/scikit-learn/compare/main...draphi:sample_weights_sparse_matrix?expand=1

Steps/Code to Reproduce

import sklearn.linear_model as lm
import scipy.sparse as sp
import numpy as np
N = 5
NN = 3
X = sp.csc_matrix((range(NN), (range(NN), range(NN))), shape=(N,N))
W = np.ones(N).reshape(-1)
y = np.array(range(N)).reshape(-1)
mdl = lm.Lasso(.1) # Same for lm.ElasticNet()
mdl.fit(X, y, W)
mdl.coef_

Expected Results

array([ 0.   , -0.625, -0.   ,  0.   ,  0.   ])

Actual Results

ValueError: Sample weights do not (yet) support sparse matrices.

Versions

System:
python: 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
executable: C:\Python\conda\python.exe
machine: Windows-10-10.0.19041-SP0

Python dependencies:
pip: 19.2.3
setuptools: 41.4.0
sklearn: 1.0.1
numpy: 1.21.4
scipy: 1.7.2
Cython: 0.29.13
pandas: 1.3.4
matplotlib: 3.1.1
joblib: 0.13.2
threadpoolctl: 3.0.0

Built with OpenMP: True

@glemaitre
Copy link
Member

I don't think that this is right. You might get a result but it is probably not the good one. The function _pre_fit does not take into account the weights with sparse data:

if sparse.isspmatrix(X):
# copy is not needed here as X is not modified inplace when X is sparse
precompute = False
X, y, X_offset, y_offset, X_scale = _preprocess_data(
X,
y,
fit_intercept=fit_intercept,
normalize=normalize,
copy=False,
return_mean=True,
check_input=check_input,
)
else:
# copy was done in fit if necessary
X, y, X_offset, y_offset, X_scale = _preprocess_data(
X,
y,
fit_intercept=fit_intercept,
normalize=normalize,
copy=copy,
check_input=check_input,
sample_weight=sample_weight,
)
if sample_weight is not None:
X, y = _rescale_data(X, y, sample_weight=sample_weight)

pinging @agramfort @lorentzenchr that worked on this and might recall if this is the real issue for limiting sample_weight to only dense matrices.

@draphi
Copy link
Author
draphi commented Nov 18, 2021

Thanks for looking into this. I have checked that the method yields the same results whether the matrix X is dense or not, so either there is a bigger problem but if there is, it isnt related to a sparse vs dense system matrix X unless I am missing something (I have edited the code to show that using weights changes the results).


import sklearn.linear_model as lm
import scipy.sparse as sp
import numpy as np
N = 100
NN = 20

args = .1

for algo in [lm.ElasticNet, lm.Lasso]:
    rows, cols, data = range(NN), range(NN), range(NN)
    rows, cols, data = np.random.randint(N, size=NN), np.random.randint(N, size=NN), np.random.randint(N, size=NN)

    X = sp.csc_matrix((data, (rows, cols)), shape=(N,N))
    Xd = X.todense()

    W = np.exp(np.random.randn(N)).reshape(-1)
    y = np.array(range(N)).reshape(-1)

    mdl_sparse = algo(args)
    mdl_sparse.fit(X, y, W)
    mdl_dense = algo(args)
    mdl_dense.fit(X, y, W)

    assert np.allclose(mdl_sparse.coef_,  mdl_dense.coef_)
    assert(np.sum(W)!=N)
    
    mdl_sparse_no_weight = algo(args)
    mdl_sparse_no_weight.fit(X, y)
    assert not np.allclose(mdl_sparse.coef_,  mdl_sparse_no_weight.coef_)

@agramfort
Copy link
Member

you have a typo. It should be:

mdl_dense.fit(Xd, y, W)

@agramfort
Copy link
Member

and fixing this your test don't pass anymore

@draphi
Copy link
Author
draphi commented Nov 18, 2021

thanks for pointing this out, is there interest to fix it?

@agramfort
Copy link
Member
agramfort commented Nov 19, 2021 via email

@glemaitre
Copy link
Member
glemaitre commented Nov 22, 2021

@agramfort to be sure, we don't currently support this case. Basically, there is no bug but it would be an enhancement then?

@agramfort
Copy link
Member
agramfort commented Nov 24, 2021 via email
7C73

@lorentzenchr
Copy link
Member

Lasso/Enet + sample weights + sparse input is currently the only missing combination to close #3702. Contributions would be very welcome.

I'm therefore closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0