Description
This bug appears in current master, and for any dense svm class.
import numpy as np
from sklearn.svm import SVC
X = np.array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2])
w = np.array([1, 1, 1, 1, 1, 1, 0, 0])
f = SVC(kernel='linear', probability=True, random_state=1)
f.fit(X,y, w)
print(f.classes_)
print(f.predict_proba(X))
Output:
[0 1 2]
warning: class label 2 specified in weight is not found
[[ 0.28963492 0.71036508]
[ 0.39180833 0.60819167]
[ 0.28963492 0.71036508]
[ 0.39180833 0.60819167]
[ 0.57544014 0.42455986]
[ 0.68293573 0.31706427]
[ 0.57544014 0.42455986]
[ 0.68293573 0.31706427]]
Here we see that svmlib internally have lost 2nd class, at the same time sklean's wrapper class keeps all class labels inside, that's why predict_proba returns matrix of shape (n_samples, 2) instead of (n_sample, 3) (what is expected by bagging classifier implementation). I understand that it's insane usage of weights by itself, but together with bagging and dataset with many labels, bagging randomly zeroes complete classes, and this bug shows itself, because bagging expects that svm's return probability of classes which they hold (e.g. all classes).
I investigated this a little bit, and can try to fix this, if someone will say that all this usage with bagging makes sense (Because i don't really sure about this).