-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
LinearRegression fits wrongly on csr sparse matrix with sample weights #19578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
OTOH, here is a simple example that does work: X = sp.sparse.csr_matrix(np.array([[1, 0]] * 100 + [[0, 1]] * 100))
y = np.array([1] * 80 + [0] * 20 + [1] * 50 + [0] * 50)
w = np.array([1] * 80 + [4] * 20 + [1] * 50 + [2] * 50)
LinearRegression(fit_intercept=False).fit(X, y, sample_weight=w).coef_
=> array([0.5 , 0.33333333]) |
Does it look like the scale of the weights might play a part? Can you check
how well it is working with weights of different scale?
|
No, it doesn't. Dividing them by 100 or 10000 still throws different results for the sparse/weighted case. Using an all ones weight vector works fine but as soon as I change some of the weights to, say, 2, the sparse/weighted case fails again. |
Another fact to take into account is that the matrix X'X is very ill conditioned, so if some kind of iterative solver is chosen for sparse input (I don't know if that's the case), then you have problems in both fronts: observations wildly differ in weight while features have very different "geometries", then it might be very difficult to accommodate a step schedule that converges. |
Hi @memeplex I believe your findings are similar to #15438 in particular as said in @lorentzenchr comment. |
If so, we can close? |
Describe the bug
Fitting a model from a csr sparse matrix while passing sample_weight gives a very wrong fit.
This report is for
LinearRegression
, but I've been experiencing wrong behavior in a similar setup withRidge
.Steps/Code to Reproduce
Taken from a small sample of a real life dataset, just copy&paste:
Expected Results
Both weighted results are the same.
Both unweighted results are the same.
Actual Results
Weighted results differ. The one for the sparse matrix is clearly wrong.
Unweighted results are the same.
Versions
The text was updated successfully, but these errors were encountered: