Weird behavior in LogisticRegression on parameter class_weight

The class_weight parameter set different weights for misclassify that class. For example, in a 0/1 classification problem, if we set class_weight={0:0.95, 1:0.05}, then we can expect the classifier to be more careful when it try to classify a data sample to be 0, since misclassify a 0 to 1 is heavily penalized.

But the LogisticRegression class seems go wrong:

from sklearn import datasets
from sklearn import svm
from sklearn import linear_model

#100 sample, half of its label is 0, others are 1
X, Y = datasets.make_classification()
Y.sum()
>>> 50 

#balance LR classifier
clr0 = linear_model.LogisticRegression()
clr0.fit(X, Y)
clr0.score(X,Y)
>>> 0.84999999999999998
clr0.predict(X).sum()
>>> 49

#imbalance LR classifier
clr1 = linear_model.LogisticRegression(class_weight={0:0.9, 1:0.1})
clr1.fit(X, Y)
clr1.score(X,Y)
>>> 0.63
clr1.predict(X).sum()
>>> 85

The imbalance classifier clr1 is supposed to classifier more data to be label 0, but it actually predict far more data to be 1. But when we choose another classfier, say SVM, the behavior seems resonable:

 
#balance SVM classifier
clr2 = svm.SVC()
clr2.fit(X,Y)
clr2.score(X,Y)
>>> 0.95999999999999996
clr2.predict(X).sum()
>>> 46

#imbalance SVM classifier
clr3 = svm.SVC(class_weight={0:0.6, 1:0.4})
clr3.fit(X,Y)
clr3.score(X,Y)
>>> 0.84999999999999998
clr1.predict(X).sum()
>>> 35.0

#another imbalance SVM classifier
clr4 = svm.SVC(class_weight={0:0.9, 1:0.1})
clr4.fit(X,Y)
clr4.score(X,Y)
>>> 0.5
clr4.predict(X).sum()
>>> 0.0

When the class_weight[0] vs class_weight[0] is 5:5, SVC predict roughly half of data to be 0.
When the class_weight[0] vs class_weight[0] is 6:4, SVC predict more data to be 0.
When the class_weight[0] vs class_weight[0] is 9:1, SVC predict all the data to be 0.

Is this a bug?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Weird behavior in LogisticRegression on parameter class_weight #1411

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Weird behavior in LogisticRegression on parameter class_weight #1411

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions