E46E Weird behavior in LogisticRegression on parameter class_weight · Issue #1411 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Weird behavior in LogisticRegression on parameter class_weight #1411
@ShusenLiu

Description

@ShusenLiu

The class_weight parameter set different weights for misclassify that class. For example, in a 0/1 classification problem, if we set class_weight={0:0.95, 1:0.05}, then we can expect the classifier to be more careful when it try to classify a data sample to be 0, since misclassify a 0 to 1 is heavily penalized.

But the LogisticRegression class seems go wrong:

from sklearn import datasets
from sklearn import svm
from sklearn import linear_model

#100 sample, half of its label is 0, others are 1
X, Y = datasets.make_classification()
Y.sum()
>>> 50 

#balance LR classifier
clr0 = linear_model.LogisticRegression()
clr0.fit(X, Y)
clr0.score(X,Y)
>>> 0.84999999999999998
clr0.predict(X).sum()
>>> 49

#imbalance LR classifier
clr1 = linear_model.LogisticRegression(class_weight={0:0.9, 1:0.1})
clr1.fit(X, Y)
clr1.score(X,Y)
>>> 0.63
clr1.predict(X).sum()
>>> 85

The imbalance classifier clr1 is supposed to classifier more data to be label 0, but it actually predict far more data to be 1. But when we choose another classfier, say SVM, the behavior seems resonable:

 
#balance SVM classifier
clr2 = svm.SVC()
clr2.fit(X,Y)
clr2.score(X,Y)
>>> 0.95999999999999996
clr2.predict(X).sum()
>>> 46

#imbalance SVM classifier
clr3 = svm.SVC(class_weight={0:0.6, 1:0.4})
clr3.fit(X,Y)
clr3.score(X,Y)
>>> 0.84999999999999998
clr1.predict(X).sum()
>>> 35.0

#another imbalance SVM classifier
clr4 = svm.SVC(class_weight={0:0.9, 1:0.1})
clr4.fit(X,Y)
clr4.score(X,Y)
>>> 0.5
clr4.predict(X).sum()
>>> 0.0

When the class_weight[0] vs class_weight[0] is 5:5, SVC predict roughly half of data to be 0.
When the class_weight[0] vs class_weight[0] is 6:4, SVC predict more data to be 0.
When the class_weight[0] vs class_weight[0] is 9:1, SVC predict all the data to be 0.

Is this a bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0