Description
Description
BaggingClassifier with base_estimator=LinearSVC(), n_estimators=10, n_jobs=10,
max_samples=0.1 takes the same time to train as LinearSVC().
Steps/Code to Reproduce
from sklearn.svm import LinearSVC
from sklearn.ensemble import BaggingClassifier
base_estimator = LinearSVC(random_state=42, tol=1e-6)
n_estimators = 10
max_samples = 1.0 / n_estimators
clf = BaggingClassifier(base_estimator, n_estimators=n_estimators,
n_jobs=10, max_samples=max_samples)
clf.fit(X_trn, y_trn)
Expected Results
I expected it to train about 10 times faster
Actual Results
Trains for the same amount of time as its base estimator LinearSVC().
Produces exactly the same accuracy as LinearSVC(), which is also strange.
When I monkey-patched base estimator's fit() method:
# Pretend that base_estimator.fit() doesn't support "sample_weight"
def fit_no_sample_weight(estimator, X, y):
return estimator._original_fit(X, y)
base_estimator._original_fit = base_estimator.fit
base_estimator.fit = fit_no_sample_weight.__get__(base_estimator, LinearSVC)
it trained as expected: about 10 times faster and accuracy was lower.
Versions
System:
python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
executable: /foo/anaconda3/bin/python
machine: Linux-3.16.36begun-x86_64-with-centos-7.3.1611-Core
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /foo/anaconda3/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 19.0.3
setuptools: 40.8.0
sklearn: 0.21.2
numpy: 1.16.2
scipy: 1.2.1
Cython: 0.29.5
pandas: 0.24.1