-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
SVC with ADABoosting #16642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please provide runnable code so that we can try to reproduce the issue. |
I have attached a zip of a python script and a jupyter notebook with the issue. Is the the right way to post it? Probably should have used a gist? As a note, i have tried multiple ways to scale the input data, and the same issue happens |
You can directly post the example here between triple backticks marker. |
from sklearn.datasets import fetch_openml
from sklearn import datasets
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.svm import SVC
from sklearn import preprocessing
mnist = fetch_openml('mnist_784', cache=False)
X = mnist.data.astype('float32')
y = mnist.target.astype('int64')
#X = preprocessing.scale(X)
X /= 255.0
size = 10000
train_x = X[:size]
train_y = y[:size]
X_train, X_test, Y_train, Y_test = train_test_split(train_x, train_y, test_size=0.6,shuffle=True)
abc = AdaBoostClassifier(SVC(random_state=0, probability=True, tol=1e-5,gamma=.01),n_estimators=3,learning_rate=.9)
abc.fit(X_train, Y_train)
svc = SVC(random_state=0, probability=True, tol=1e-5,gamma=.01)
svc.fit(X_train, Y_train)
print("base acc")
print(abc.estimators_[0].score(X_test,Y_test))
print("svc with same training")
print(svc.score(X_test,Y_test)) |
Hi @QuantumChamploo, I think this is the expected behavior. From the documentation:
|
Still it's weird to get such a smaller accuracy just by uniformly re weighing the first fit by |
This is probably related to #15657: as some (most?) scikit-learn estimators already do a But on the other hand I would not have expected SVC to give such poor results when we reweigh uniformly. But maybe this is because then the regularizer term of the loss function starts to dominate the data-fit term and we get a constant predictor as a result. |
Indeed to get the equivalence, one could multiply FYI: from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
X, y = datasets.load_digits(return_X_y=True)
X /= 16
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.6, shuffle=True, random_state=0
)
svc = SVC(
C=1 * len(y_train), probability=True, tol=1e-5, gamma=.01, random_state=0
)
adaboost = AdaBoostClassifier(svc, n_estimators=3, learning_rate=.9)
adaboost.fit(X_train, y_train)
svc.set_params(C=1)
svc.fit(X_train, y_train)
print("First weak learner in AdaBoost")
print(adaboost.estimators_[0].score(X_test, y_test))
print("SVC learner alone")
print(svc.score(X_test, y_test))
Of course, we still have an issue in scikit-learn because we do not have a consistent formulation of |
I am trying to use SVC with ADAboosting. WIth SVC, but not other base estimators, the initial estimator does not seemed to be trained.
Comparing the initial estimator of the ensemble to a single estimator with the same hyper-parameters
create an ADA ensemble
Create a single SVC
Compare the accuracies
I have tried different hyper-parameters and had the same issue. This issue does happen for RandomForests and DecisionTree Classifiers
The text was updated successfully, but these errors were encountered: