Unexpected class_weight behavior in RidgeClassifier #1489

amueller · 2012-12-24T16:00:45Z

While testing the class_weight parameter in several estimators to investigate #1411, I noticed that the parameter doesn't work as I would expect in RidgeClassifier either.
Maybe this is some misunderstanding on my part or some regularization issue, but if I have noisy labels, I would have expected to be be able to move the decision boundary.

This is not the case as illustrated in this notebook

Any input would be appreciated.

amueller · 2012-12-24T22:14:17Z

I think I found the solution in #1190. Class weights and sample weights are only supported for a certain solver. Otherwise class_weight and sample_weight silently ignored. Also, this is not documented anywhere.
Moreover, using RidgeClassifierCV, it is not possible to actually specify the solver as far as I can tell.

amueller · 2012-12-24T22:25:27Z

Ok, I was wrong. Actually, if sample_weight is specified, the solver is silently overwritten to be dense_cholesky.
So no idea why the sample weights don't work as I expect.

agramfort · 2012-12-26T07:57:17Z

I tried this script. The behavior is indeed a bit counter intuitive as the lines always cross so the hyperplance cannot really be shifted...

import numpy as np
import pylab as pl
from sklearn.datasets import make_blobs
from sklearn.linear_model import RidgeClassifier

X, y = make_blobs(centers=2, cluster_std=1.3, random_state=0)
pl.prism()
pl.scatter(X[:, 0], X[:, 1], c=y)
y = 2*y - 1

fit_intercept = True
clf0 = RidgeClassifier(fit_intercept=fit_intercept)
clf0.fit(X, y)
print clf0.intercept_
clf1 = RidgeClassifier(class_weight={0:1e-10, 1:1e10}, fit_intercept=fit_intercept)
clf1.fit(X, y)
print clf1.intercept_
clf2 = RidgeClassifier(class_weight={0:1e10, 1:1e-10}, fit_intercept=fit_intercept)
clf2.fit(X, y)
print clf2.intercept_

def plot_line(clf):
    w = clf.coef_.ravel()
    a = -w[0] / w[1]
    xx = np.linspace(-30, 30, 100)
    yy = a * xx - clf.intercept_ / w[1]
    pl.plot(xx, yy)

pl.scatter(X[:, 0], X[:, 1], c=y)
plot_line(clf0)
plot_line(clf1)
plot_line(clf2)
pl.grid('on')
pl.show()

agramfort · 2012-12-26T09:11:42Z

a good test could be to duplicate the instances of one class and see if it gives the same result as with a class_weight of 2 for this class.

amueller · 2012-12-26T09:30:32Z

For the lazy one: this is the output of Alex' script

This is indeed a bit odd. I tried an example where the boundary is very noisy. In this case I would really expect that modifying the weight really shifts the boundary. I'll give your idea a try.

mblondel · 2012-12-28T09:09:56Z

We need to rederive the closed-form solution when sample weights are used and check if the current code is correct. I'm on holidays so it's a bit difficult for me to check this right now. I will try to do it later if no one else has done it before.

weilinear · 2013-01-14T14:11:02Z

I think Ridge regression has this bug as well. In ridge_regression(),

A = np.dot(X, X.T)
A.flat[::n_samples + 1] += alpha * sample_weight

The sample_weight is timed by the regularization parameter which is a little wired to me.

GaelVaroquaux · 2013-01-17T17:21:47Z

Dislaimer: I never gave a thought to sample weights before today, so the following might be naive/wrong.

I did a little back of the envelop calculation, and it seems to me that the solution of the ridge with sample weights simply corresponds to (after having centered/normalized the data), multiplying X and y by np.sqrt(sample_weight).

I can somewhat relate that to what is being done (via a bit of clever algebra, as the solver uses the kernel trick) but it seems to me that it should be apha / sample_weight.

Anyhow, I am not sure that I like the approach that is taken as it forces the dense_cholesky solver and the kernel trick if sample weights are given. @mblondel, you wrote this code, what do you think?

I'll first write a test comparing ridge_regression with sample weights to no sample weights but rescaling X and y, and then I'll see.

GaelVaroquaux · 2013-01-17T17:24:34Z

By the way, sample weights are so far pretty much untested :(

GaelVaroquaux · 2013-01-17T17:37:51Z

I have added the test:
https://github.com/GaelVaroquaux/scikit-learn/tree/ridge_sample_weights

And it pretty clearly fails. Is my testing logic wrong, or is there a bug?

mblondel · 2013-01-17T18:04:44Z

I did a little back of the envelop calculation, and it seems to me that the solution of the ridge with sample weights simply corresponds to (after having centered/normalized the data), multiplying X and y by np.sqrt(sample_weight).

See #1190

This method is the method of choice for solver="lsqr" (since we don't control it) and solver="dense_cholesky" in the primal / linear ridge case: X.T X is a (n_features, n_features) matrix so I don't see any other way than multiplying X by sqrt(sample_weights) beforehand.

For solver="dense_cholesky" in the dual / kernel ridge case, I will check the validy of the current code.
For solver="sparse_cg", there might be a way to prevent a copy of X.

The current situation (forcing dense_cholesky) is clearly not good but this is because originally Ridge only supported one solver.

One thing that I also want to check is whether all solvers really solve the same objective. Some solvers might solve 0.5 ||Xw - y||^2 and others ||Xw - y||^2. Same for the regularization term. To be checked.

mblondel · 2013-01-17T18:27:13Z

The current code in the dual / kernel ridge case is wrong :-( Here's the correct code (with more meaningful variable names and without a copy of X):

K = safe_sparse_dot(X, X.T, dense_output=True)
sw = np.sqrt(sample_weights)
K *= np.outer(sw, sw)
K.flat[::n_samples + 1] += alpha 
dual_coef = linalg.solve(K, y * sw, sym_pos=True, overwrite_a=True)
coef = safe_sparse_dot(X.T, dual_coef, dense_output=True)

Gael, can you push the fix for the dual case together with your test (I want to go to bed, it's very late here)?

GaelVaroquaux · 2013-01-17T18:39:58Z

@mblondel : thanks for confirming our intuition. I'll push the fix tonight and check that my tests are correct.

GaelVaroquaux · 2013-01-17T18:56:56Z

OK, I have implemented your fix (and I agree with the calculations). It
passes my tests. The bad need is that I have a failing test now:
test_class_weights

That test looks a bit dodgy to me, and I cannot tell if it is right or
not. I need to run right now, but I'll look at it a bit more.

amueller · 2013-01-17T19:28:09Z

i recently added the test to ensure consistent semantics. linearsvc did the opposite of svc before. do you know a better test that will work for all classifiers?

Gael Varoquaux notifications@github.com schrieb:

OK, I have implemented your fix (and I agree with the calculations). It
passes my tests. The bad need is that I have a failing test now:
test_class_weights

That test looks a bit dodgy to me, and I cannot tell if it is right or
not. I need to run right now, but I'll look at it a bit more.

Reply to this email directly or view it on GitHub:
#1489 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

GaelVaroquaux · 2013-01-17T22:30:42Z

On Thu, Jan 17, 2013 at 11:28:16AM -0800, Andreas Mueller wrote:

i recently added the test to ensure consistent semantics.

I am looking at the test, and after a closer look, it looks OK to me. I
don't know why the new code is posing problem. I need to investigate.

do you know a better test that will work for all classifiers?

No. :(

GaelVaroquaux · 2013-01-18T07:56:04Z

Correction on what I said yesterday evening: I goofed up with branch management in git (doing too many things in parallel). My new tests do not pass with this code. I still need to look at things a bit closer.

GaelVaroquaux · 2013-01-18T08:02:28Z

By the way, I am doing all the changes in my ridge_sample_weights branch.

GaelVaroquaux · 2013-01-18T08:14:58Z

Allright, I have found a bug, and fixed my test on sample weights. However, I still have the class_weight test failing.

GaelVaroquaux · 2013-01-18T08:26:59Z

To come back to the original problem that was raised earlied in this issue, and to the test that is currently failing, I believe that the problem is that the decision boundary is not shifted. In other words, changing sample weights does change the direction of this boundary, but not the offset.

To me, it is simply that the intercept is unaffected by the sample weights. Do people agree with this analysis?

GaelVaroquaux · 2013-01-18T08:38:52Z

I am unsure about my last comment: modifying the SVM example plot_separating_hyperplane_unbalanced.py to use a ridge classifier seems to work well.

amueller · 2013-01-18T08:49:05Z

it is a bit weird that this example works, but the test does not. without looking at the code the intercept explanation sounded plausible. does the weight change the ratio of the classes in the test at all?

Gael Varoquaux notifications@github.com schrieb:

I am unsure about my last comment: modifying the SVM example
plot_separating_hyperplane_unbalanced.py to use a ridge classifier
seems to work well.

Reply to this email directly or view it on GitHub:
#1489 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

mblondel · 2013-01-18T08:56:35Z

@GaelVaroquaux Thanks for tackling this.

I had a quick look at the gradient of the primal objective. I was hoping that for gradient-based iterative solvers we could maybe get around copying X but apparently it cannot be avoided. So the dual is the only case when we can avoid a copy.

Regarding the implementation of #1190, it's trivial in the dense case but may require a bit of work in the sparse case (I don't think CSR sparse matrices support element-wise multiplication with a broadcasted 1d numpy array).

amueller · 2013-01-20T12:14:06Z

Is there a chance to fix that for the release? I'm not that familiar with the method and would like to avoid digging to deep today.

GaelVaroquaux · 2013-01-20T12:58:06Z

Is there a chance to fix that for the release? I'm not that familiar with the
method and would like to avoid digging to deep today.

I'll try to finish that off today. I give it a couple of hours, and if I
cannot get it to work, I'll move on to something else.

amueller · 2013-01-20T12:59:05Z

Great, thanks.

GaelVaroquaux · 2013-01-20T15:17:32Z

Alright, I think that I have fixed all the problems. I'll do a PR in a minute. All the tests pass, and the fishy example reported by @agramfort now gives something consistent (I have to reduce the weights from 1e10 to 1e1 as elsewhere it was too violent):

GaelVaroquaux · 2013-01-20T18:57:21Z

Fixed in PR #1600

amueller mentioned this issue Dec 26, 2012

MRG LibLinear class weights fix #1491

Merged

GaelVaroquaux mentioned this issue Jan 20, 2013

Ridge sample weights: fixes #1489 #1600

Closed

GaelVaroquaux closed this as completed Jan 20, 2013

mblondel mentioned this issue Jan 14, 2014

sample_weight not tested for random forests #2752

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unexpected class_weight behavior in RidgeClassifier #1489

Unexpected class_weight behavior in RidgeClassifier #1489

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Unexpected class_weight behavior in RidgeClassifier #1489

Unexpected class_weight behavior in RidgeClassifier #1489

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!