8000 Dense svm and zeroed weight for samples of entire class · Issue #5150 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Dense svm and zeroed weight for samples of entire class #5150
Open
@olologin

Description

@olologin

This bug appears in current master, and for any dense svm class.

import numpy as np
from sklearn.svm import SVC
X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])


f = SVC(kernel='linear', probability=True, random_state=1)
f.fit(X,y, w)
print(f.classes_)
print(f.predict_proba(X))

Output:

[0 1 2]
warning: class label 2 specified in weight is not found
[[ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]]

Here we see that svmlib internally have lost 2nd class, at the same time sklean's wrapper class keeps all class labels inside, that's why predict_proba returns matrix of shape (n_samples, 2) instead of (n_sample, 3) (what is expected by bagging classifier implementation). I understand that it's insane usage of weights by itself, but together with bagging and dataset with many labels, bagging randomly zeroes complete classes, and this bug shows itself, because bagging expects that svm's return probability of classes which they hold (e.g. all classes).

I investigated this a little bit, and can try to fix this, if someone will say that all this usage with bagging makes sense (Because i don't really sure about this).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0