Dense svm and zeroed weight for samples of entire class

This bug appears in current master, and for any dense svm class.

import numpy as np
from sklearn.svm import SVC
X = np.array([[0, 0, 0],
              [0, 0, 1],
              [0, 1, 0],
              [0, 1, 1],
              [1, 0, 0],
              [1, 0, 1],
              [1, 1, 0],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])


f = SVC(kernel='linear', probability=True, random_state=1)
f.fit(X,y, w)
print(f.classes_)
print(f.predict_proba(X))

Output:

[0 1 2]
warning: class label 2 specified in weight is not found
[[ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.28963492  0.71036508]
 [ 0.39180833  0.60819167]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]
 [ 0.57544014  0.42455986]
 [ 0.68293573  0.31706427]]

Here we see that svmlib internally have lost 2nd class, at the same time sklean's wrapper class keeps all class labels inside, that's why predict_proba returns matrix of shape (n_samples, 2) instead of (n_sample, 3) (what is expected by bagging classifier implementation). I understand that it's insane usage of weights by itself, but together with bagging and dataset with many labels, bagging randomly zeroes complete classes, and this bug shows itself, because bagging expects that svm's return probability of classes which they hold (e.g. all classes).

I investigated this a little bit, and can try to fix this, if someone will say that all this usage with bagging makes sense (Because i don't really sure about this).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions