-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Weird behavior in LogisticRegression on parameter class_weight #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this implementation use Liblinear. Therefore the meaning of class weight depends on the actual optimization method implemented. I tried to use different regularizer, and the number of 1 predicted also changed dramatically. Maybe you shouldn't set the class weight :) |
It's a weird behavior, we should consider it as a bug, right? I look up LibLinear ReadMe and in line198: LibLinear 1.92 ReadMe line198 We implement 1-vs-the rest multi-class strategy for classification. Obviously Linlinear supports different weights for different classes. Is this a bug in sklearn wrapper, or a bug in liblinear? On Tue, Nov 27, 2012 at 10:20 AM, Meng Xinfan notifications@github.comwrote:
|
I think this is unlikely a bug in sklearn or liblinear. I tried a couple of examples, and print out the confusion matrix. I found you can see more about parameter C from the following references: http://jmlr.csail.mit.edu/papers/volume9/fan08a/fan08a.pdf On Wed, Nov 28, 2012 at 11:38 AM, Shusen Liu notifications@github.comwrote:
Best WishesMeng Xinfan(蒙新泛) |
Sorry for the lack of feedback, the devs are busy at the moment. |
Really? very strange. I can see that the number of counts is increasing. I think this is unlikely a bug in sklearn or liblinear. I tried a couple of examples, and print out the confusion matrix. I found you can see more about parameter C from the following references: http://jmlr.csail.mit.edu/papers/volume9/fan08a/fan08a.pdf On Wed, Nov 28, 2012 at 5:08 PM, Andreas Mueller
Best WishesMeng Xinfan(蒙新泛) |
@fannix Well, now I have the reply in my inbox a couple of times but still can not see it here. Maybe try to contact github on this? They are usually very responsive. It shows that you are part of the discussion at least. |
OK, thanks. |
@amueller You can read my comments now. |
Yes :) great. Did you contact github? |
Yes, it turns out that I was recognized as a spammer and blocked.. On Fri, Nov 30, 2012 at 5:19 PM, Andreas Mueller
Best WishesMeng Xinfan(蒙新泛) |
@fannix Let that be a lesson to you ;) [ the lesson being that machine learning algorithms can not be trusted] |
Ok after having looked at this for 10 minutes, this is definitely a bug in the semantics of sklearn. After a quick check this produces similar results as class weights with SVC(kernel="linear"). I'll have a look at what we are doing in SVC now. |
Hm there is no explanation of the class weights in the liblinear docs. I have no idea why the behavior of liblinear and libsvm is different here. It seems like we do exactly the same thing on the scikit-learn side. |
I think the class weight is the same as LibSVM, as described in the below http://jmlr.csail.mit.edu/papers/volume9/fan08a/fan08a.pdf On Sat, Dec 1, 2012 at 4:53 AM, Andreas Mueller notifications@github.comwrote:
Best WishesMeng Xinfan(蒙新泛) |
@fannix from the docs it looks like it would be the same. But then why is the effect different? Maybe somewhere in the ova and ovo the class indices are switched.... uh oh starting to feel a bit guilty now... maybe I broke this... |
I'm still bisecting but it seems that I reversed the behavior of SVC, but not LinearSVC when I messed with the signs / class ordering. Which would explain why they are inconsistent now. |
Turns out I'm really bad with git bisect :( |
Any help with finding out when the behavior of SVC changed and any idea what to do about it would be welcome. |
@amueller I checked the liblinear source code: it only supports class weights if the solver is not::
The else branch in when you print
the routine For binary classification with logistic loss the invocation of
(see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/src/liblinear/linear.cpp#L2443) So the weighted C of the positive class is The question is the following: does liblinear sort the class labels in ascending order and picks the first one as the positive class? If so, that's the opposite of what sklearn does... For OVA and Cramer&Singer this is not an issue. |
Thanks for investigating! |
import pylab as pl
import sklearn
from sklearn import linear_model, svm
import numpy as np
from sklearn import datasets
X, y = datasets.make_classification(n_samples=100, n_features=2, n_redundant=0)
pl.scatter(X[:, 0], X[:, 1], c=y)
clr0 = linear_model.LogisticRegression()
clr0.fit(X, y)
clr0.predict(X).sum()
w = clr0.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-5, 5)
yy = a * xx - clr0.intercept_[0] / w[1]
pl.plot(xx, yy, 'k--', label='no weights')
clr1 = linear_model.LogisticRegression(class_weight={0: 0.9, 1: 0.1})
clr1.fit(X, y)
w1 = clr1.coef_[0]
a1 = -w1[0] / w1[1]
xx1 = np.linspace(-5, 5)
yy1 = a * xx1 - clr1.intercept_[0] / w1[1]
pl.plot(xx1, yy1, 'k-', label='with weights')
pl.legend()
pl.show() This is the plot showing the imbalance problem. Hope it helps. I am also trying to make sense of the class weights... |
I tried a couple of classifiers. It turns out LinearSVC and LogisticRegression have the same behavior; and they have different behavior compared with SVC(kernel="linear") |
According to the LibSVM guide: http://pyml.sourceforge.net/doc/howto.pdf Assuming n+ (n-) is the number of positive (negative) examples. then, Therefore I don't think large C+ lead to stronger weight for the positive class. |
I think we have to be careful - first of all, I'm referring to Liblinear - 2012/12/4 Meng Xinfan notifications@github.com
Peter Prettenhofer |
Ok this is worse than I thought as this means the 'auto' parameter does completely crazy things?! |
OK, sorry for the long delay, getting back to it now. I think my approach will be to write tests and check the expected behavior and put in some switches so it does what I think it should. The signs and the 0 <-> 1 switch are already somewhat messed up unfortunately :( |
Corrected my statement above. Larger C_i means more samples are classified as class i! |
There is a fix in #1491. |
Thanks for all your hard work ^.^ |
No problem :) |
Merged #1491 so this is fixed in master now :) |
The class_weight parameter set different weights for misclassify that class. For example, in a 0/1 classification problem, if we set class_weight={0:0.95, 1:0.05}, then we can expect the classifier to be more careful when it try to classify a data sample to be 0, since misclassify a 0 to 1 is heavily penalized.
But the LogisticRegression class seems go wrong:
The imbalance classifier clr1 is supposed to classifier more data to be label 0, but it actually predict far more data to be 1. But when we choose another classfier, say SVM, the behavior seems resonable:
When the class_weight[0] vs class_weight[0] is 5:5, SVC predict roughly half of data to be 0.
When the class_weight[0] vs class_weight[0] is 6:4, SVC predict more data to be 0.
When the class_weight[0] vs class_weight[0] is 9:1, SVC predict all the data to be 0.
Is this a bug?
The text was updated successfully, but these errors were encountered: