8000 Unexpected class_weight behavior in RidgeClassifier · Issue #1489 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Unexpected class_weight behavior in RidgeClassifier #1489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Dec 24, 2012 · 28 comments
Closed

Unexpected class_weight behavior in RidgeClassifier #1489

amueller opened this issue Dec 24, 2012 · 28 comments
Labels
Milestone

Comments

@amueller
Copy link
Member

While testing the class_weight parameter in several estimators to investigate #1411, I noticed that the parameter doesn't work as I would expect in RidgeClassifier either.
Maybe this is some misunderstanding on my part or some regularization issue, but if I have noisy labels, I would have expected to be be able to move the decision boundary.

This is not the case as illustrated in this notebook

Any input would be appreciated.

@amueller
Copy link
Member Author

I think I found the solution in #1190. Class weights and sample weights are only supported for a certain solver. Otherwise class_weight and sample_weight silently ignored. Also, this is not documented anywhere.
Moreover, using RidgeClassifierCV, it is not possible to actually specify the solver as far as I can tell.

@amueller
Copy link
Member Author

Ok, I was wrong. Actually, if sample_weight is specified, the solver is silently overwritten to be dense_cholesky.
So no idea why the sample weights don't work as I expect.

@agramfort
Copy link
Member

I tried this script. The behavior is indeed a bit counter intuitive as the lines always cross so the hyperplance cannot really be shifted...

import numpy as np
import pylab as pl
from sklearn.datasets import make_blobs
from sklearn.linear_model import RidgeClassifier

X, y = make_blobs(centers=2, cluster_std=1.3, random_state=0)
pl.prism()
pl.scatter(X[:, 0], X[:, 1], c=y)
y = 2*y - 1

fit_intercept = True
clf0 = RidgeClassifier(fit_intercept=fit_intercept)
clf0.fit(X, y)
print clf0.intercept_
clf1 = RidgeClassifier(class_weight={0:1e-10, 1:1e10}, fit_intercept=fit_intercept)
clf1.fit(X, y)
print clf1.intercept_
clf2 = RidgeClassifier(class_weight={0:1e10, 1:1e-10}, fit_intercept=fit_intercept)
clf2.fit(X, y)
print clf2.intercept_

def plot_line(clf):
    w = clf.coef_.ravel()
    a = -w[0] / w[1]
    xx = np.linspace(-30, 30, 100)
    yy = a * xx - clf.intercept_ / w[1]
    pl.plot(xx, yy)

pl.scatter(X[:, 0], X[:, 1], c=y)
plot_line(clf0)
plot_line(clf1)
plot_line(clf2)
pl.grid('on')
pl.show()

@agramfort
Copy link
Member

a good test could be to duplicate the instances of one class and see if it gives the same result as with a class_weight of 2 for this class.

@amueller
Copy link
Member Author

For the lazy one: this is the output of Alex' script
blub

This is indeed a bit odd. I tried an example where the boundary is very noisy. In this case I would really expect that modifying the weight really shifts the boundary. I'll give your idea a try.

@mblondel
Copy link
Member

We need to rederive the closed-form solution when sample weights are used and check if the current code is correct. I'm on holidays so it's a bit difficult for me to check this right now. I will try to do it later if no one else has done it before.

@weilinear
Copy link
Member

I think Ridge regression has this bug as well. In ridge_regression(),

A = np.dot(X, X.T)
A.flat[::n_samples + 1] += alpha * sample_weight

The sample_weight is timed by the regularization parameter which is a little wired to me.

@GaelVaroquaux
Copy link
Member

Dislaimer: I never gave a thought to sample weights before today, so the following might be naive/wrong.

I did a little back of the envelop calculation, and it seems to me that the solution of the ridge with sample weights simply corresponds to (after having centered/normalized the data), multiplying X and y by np.sqrt(sample_weight).

I can somewhat relate that to what is being done (via a bit of clever algebra, as the solver uses the kernel trick) but it seems to me that it should be apha / sample_weight.

Anyhow, I am not sure that I like the approach that is taken as it forces the dense_cholesky solver and the kernel trick if sample weights are given. @mblondel, you wrote this code, what do you think?

I'll first write a test comparing ridge_regression with sample weights to no sample weights but rescaling X and y, and then I'll see.

@GaelVaroquaux
Copy link
Member

By the way, sample weights are so far pretty much untested :(

@GaelVaroquaux
Copy link
Member

I have added the test:
https://github.com/GaelVaroquaux/scikit-learn/tree/ridge_sample_weights

And it pretty clearly fails. Is my testing logic wrong, or is there a bug?

@mblondel
Copy link
Member

I did a little back of the envelop calculation, and it seems to me that the solution of the ridge with sample weights simply corresponds to (after having centered/normalized the data), multiplying X and y by np.sqrt(sample_weight).

See #1190

This method is the method of choice for solver="lsqr" (since we don't control it) and solver="dense_cholesky" in the primal / linear ridge case: X.T X is a (n_features, n_features) matrix so I don't see any other way than multiplying X by sqrt(sample_weights) beforehand.

For solver="dense_cholesky" in the dual / kernel ridge case, I will check the validy of the current code.
For solver="sparse_cg", there might be a way to prevent a copy of X.

The current situation (forcing dense_cholesky) is clearly not good but this is because originally Ridge only supported one solver.

One thing that I also want to check is whether all solvers really solve the same objective. Some solvers might solve 0.5 ||Xw - y||^2 and others ||Xw - y||^2. Same for the regularization term. To be checked.

@mblondel
Copy link
Member

The current code in the dual / kernel ridge case is wrong :-( Here's the correct code (with more meaningful variable names and without a copy of X):

K = safe_sparse_dot(X, X.T, dense_output=True)
sw = np.sqrt(sample_weights)
K *= np.outer(sw, sw)
K.flat[::n_samples + 1] += alpha 
dual_coef = linalg.solve(K, y * sw, sym_pos=True, overwrite_a=True)
coef = safe_sparse_dot(X.T, dual_coef, dense_output=True)

Gael, can you push the fix for the dual case together with your test (I want to go to bed, it's very late here)?

@GaelVaroquaux
Copy link
Member

@mblondel : thanks for confirming our intuition. I'll push the fix tonight and check that my tests are correct.

@GaelVaroquaux
Copy link
Member

OK, I have implemented your fix (and I agree with the calculations). It
passes my tests. The bad need is that I have a failing test now:
test_class_weights

That test looks a bit dodgy to me, and I cannot tell if it is right or
not. I need to run right now, but I'll look at it a bit more.

@amueller
Copy link
Member Author

i recently added the test to ensure consistent semantics. linearsvc did the opposite of svc before. do you know a better test that will work for all classifiers?

Gael Varoquaux notifications@github.com schrieb:

OK, I have implemented your fix (and I agree with the calculations). It
passes my tests. The bad need is that I have a failing test now:
test_class_weights

That test looks a bit dodgy to me, and I cannot tell if it is right or
not. I need to run right now, but I'll look at it a bit more.


Reply to this email directly or view it on GitHub:
#1489 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

@GaelVaroquaux
Copy link
Member

On Thu, Jan 17, 2013 at 11:28:16AM -0800, Andreas Mueller wrote:

i recently added the test to ensure consistent semantics.

I am looking at the test, and after a closer look, it looks OK to me. I
don't know why the new code is posing problem. I need to investigate.

do you know a better test that will work for all classifiers?

No. :(

@GaelVaroquaux
Copy link
Member

Correction on what I said yesterday evening: I goofed up with branch management in git (doing too many things in parallel). My new tests do not pass with this code. I still need to look at things a bit closer.

@GaelVaroquaux
Copy link
Member

By the way, I am doing all the changes in my ridge_sample_weights branch.

@GaelVaroquaux
Copy link
Member

Allright, I have found a bug, and fixed my test on sample weights. However, I still have the class_weight test failing.

@GaelVaroquaux
Copy link
Member

To come back to the original problem that was raised earlied in this issue, and to the test that is currently failing, I believe that the problem is that the decision boundary is not shifted. In other words, changing sample weights does change the direction of this boundary, but not the offset.

To me, it is simply that the intercept is unaffected by the sample weights. Do people agree with this analysis?

@GaelVaroquaux
Copy link
Member

I am unsure about my last comment: modifying the SVM example plot_separating_hyperplane_unbalanced.py to use a ridge classifier seems to work well.

@amueller
Copy link
Member Author

it is a bit weird that this example works, but the test does not. without looking at the code the intercept explanation sounded plausible. does the weight change the ratio of the classes in the test at all?

Gael Varoquaux notifications@github.com schrieb:

I am unsure about my last comment: modifying the SVM example
plot_separating_hyperplane_unbalanced.py to use a ridge classifier
seems to work well.


Reply to this email directly or view it on GitHub:
#1489 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

@mblondel
Copy link
Member

@GaelVaroquaux Thanks for tackling this.

I had a quick look at the gradient of the primal objective. I was hoping that for gradient-based iterative solvers we could maybe get around copying X but apparently it cannot be avoided. So the dual is the only case when we can avoid a copy.

Regarding the implementation of #1190, it's trivial in the dense case but may require a bit of work in the sparse case (I don't think CSR sparse matrices support element-wise multiplication with a broadcasted 1d numpy array).

@amueller
Copy link
Member Author

Is there a chance to fix that for the release? I'm not that familiar with the method and would like to avoid digging to deep today.

@GaelVaroquaux
Copy link
Member

Is there a chance to fix that for the release? I'm not that familiar with the
method and would like to avoid digging to deep today.

I'll try to finish that off today. I give it a couple of hours, and if I
cannot get it to work, I'll move on to something else.

@amueller
Copy link
Member Author

Great, thanks.

@GaelVaroquaux
Copy link
Member

Alright, I think that I have fixed all the problems. I'll do a PR in a minute. All the tests pass, and the fishy example reported by @agramfort now gives something consistent (I have to reduce the weights from 1e10 to 1e1 as elsewhere it was too violent):

class_weights

@GaelVaroquaux
Copy link
Member

Fixed in PR #1600

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
0