-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Unexpected class_weight behavior in RidgeClassifier #1489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I found the solution in #1190. Class weights and sample weights are only supported for a certain solver. Otherwise |
Ok, I was wrong. Actually, if |
I tried this script. The behavior is indeed a bit counter intuitive as the lines always cross so the hyperplance cannot really be shifted...
|
a good test could be to duplicate the instances of one class and see if it gives the same result as with a class_weight of 2 for this class. |
We need to rederive the closed-form solution when sample weights are used and check if the current code is correct. I'm on holidays so it's a bit difficult for me to check this right now. I will try to do it later if no one else has done it before. |
I think A = np.dot(X, X.T)
A.flat[::n_samples + 1] += alpha * sample_weight The |
Dislaimer: I never gave a thought to sample weights before today, so the following might be naive/wrong. I did a little back of the envelop calculation, and it seems to me that the solution of the ridge with sample weights simply corresponds to (after having centered/normalized the data), multiplying X and y by np.sqrt(sample_weight). I can somewhat relate that to what is being done (via a bit of clever algebra, as the solver uses the kernel trick) but it seems to me that it should be apha / sample_weight. Anyhow, I am not sure that I like the approach that is taken as it forces the dense_cholesky solver and the kernel trick if sample weights are given. @mblondel, you wrote this code, what do you think? I'll first write a test comparing ridge_regression with sample weights to no sample weights but rescaling X and y, and then I'll see. |
By the way, sample weights are so far pretty much untested :( |
I have added the test: And it pretty clearly fails. Is my testing logic wrong, or is there a bug? |
See #1190 This method is the method of choice for solver="lsqr" (since we don't control it) and solver="dense_cholesky" in the primal / linear ridge case: X.T X is a (n_features, n_features) matrix so I don't see any other way than multiplying X by sqrt(sample_weights) beforehand. For solver="dense_cholesky" in the dual / kernel ridge case, I will check the validy of the current code. The current situation (forcing dense_cholesky) is clearly not good but this is because originally Ridge only supported one solver. One thing that I also want to check is whether all solvers really solve the same objective. Some solvers might solve 0.5 ||Xw - y||^2 and others ||Xw - y||^2. Same for the regularization term. To be checked. |
The current code in the dual / kernel ridge case is wrong :-( Here's the correct code (with more meaningful variable names and without a copy of X): K = safe_sparse_dot(X, X.T, dense_output=True)
sw = np.sqrt(sample_weights)
K *= np.outer(sw, sw)
K.flat[::n_samples + 1] += alpha
dual_coef = linalg.solve(K, y * sw, sym_pos=True, overwrite_a=True)
coef = safe_sparse_dot(X.T, dual_coef, dense_output=True) Gael, can you push the fix for the dual case together with your test (I want to go to bed, it's very late here)? |
@mblondel : thanks for confirming our intuition. I'll push the fix tonight and check that my tests are correct. |
OK, I have implemented your fix (and I agree with the calculations). It That test looks a bit dodgy to me, and I cannot tell if it is right or |
i recently added the test to ensure consistent semantics. linearsvc did the opposite of svc before. do you know a better test that will work for all classifiers? Gael Varoquaux notifications@github.com schrieb:
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. |
On Thu, Jan 17, 2013 at 11:28:16AM -0800, Andreas Mueller wrote:
I am looking at the test, and after a closer look, it looks OK to me. I do you know a better test that will work for all classifiers? No. :( |
Correction on what I said yesterday evening: I goofed up with branch management in git (doing too many things in parallel). My new tests do not pass with this code. I still need to look at things a bit closer. |
By the way, I am doing all the changes in my ridge_sample_weights branch. |
Allright, I have found a bug, and fixed my test on sample weights. However, I still have the class_weight test failing. |
To come back to the original problem that was raised earlied in this issue, and to the test that is currently failing, I believe that the problem is that the decision boundary is not shifted. In other words, changing sample weights does change the direction of this boundary, but not the offset. To me, it is simply that the intercept is unaffected by the sample weights. Do people agree with this analysis? |
I am unsure about my last comment: modifying the SVM example plot_separating_hyperplane_unbalanced.py to use a ridge classifier seems to work well. |
it is a bit weird that this example works, but the test does not. without looking at the code the intercept explanation sounded plausible. does the weight change the ratio of the classes in the test at all? Gael Varoquaux notifications@github.com schrieb:
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. |
@GaelVaroquaux Thanks for tackling this. I had a quick look at the gradient of the primal objective. I was hoping that for gradient-based iterative solvers we could maybe get around copying X but apparently it cannot be avoided. So the dual is the only case when we can avoid a copy. Regarding the implementation of #1190, it's trivial in the dense case but may require a bit of work in the sparse case (I don't think CSR sparse matrices support element-wise multiplication with a broadcasted 1d numpy array). |
Is there a chance to fix that for the release? I'm not that familiar with the method and would like to avoid digging to deep today. |
I'll try to finish that off today. I give it a couple of hours, and if I |
Great, thanks. |
Alright, I think that I have fixed all the problems. I'll do a PR in a minute. All the tests pass, and the fishy example reported by @agramfort now gives something consistent (I have to reduce the weights from 1e10 to 1e1 as elsewhere it was too violent): |
Fixed in PR #1600 |
While testing the class_weight parameter in several estimators to investigate #1411, I noticed that the parameter doesn't work as I would expect in RidgeClassifier either.
Maybe this is some misunderstanding on my part or some regularization issue, but if I have noisy labels, I would have expected to be be able to move the decision boundary.
This is not the case as illustrated in this notebook
Any input would be appreciated.
The text was updated successfully, but these errors were encountered: