You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RandomForestRegressor in GridSearchCV uses more cores than specified.
importnumpyasnpfromsklearn.ensembleimportRandomForestRegressorfromsklearn.model_selectionimportGridSearchCVn_samples, n_features=10000, 10X=np.random.randn(n_samples, n_features)
y=np.random.randn(n_samples)
grid= {'n_estimators': [100, 200]}
# only one core is used (GOOD)rfr=RandomForestRegressor(n_estimators=100)
rfr.fit(X, y)
# only two cores are used (GOOD)rfr=RandomForestRegressor(n_estimators=100, n_jobs=1)
gsc=GridSearchCV(rfr, grid, n_jobs=2)
gsc.fit(X, y)
# more than two cores are used (BUG)rfr=RandomForestRegressor(n_estimators=100)
gsc=GridSearchCV(rfr, grid, n_jobs=2)
gsc.fit(X, y)
This is a bug from joblib indeed. It has to do with the prefer='threads'.
The RandomForestRegressor.fit method is called with a ThreadingBackend and n_jobs=-1, as the parameter prefer is passed to the Parallel instance. The RandomForestregressor.predict method is called using the SequentialBackend and the difference between these two calls is that the parameter prefer is not passed here.
I will further investigate and try to come up with a fix in joblib.
so we close this? Or wait for new joblib to be released / merged to scikit-learn? or wait for unvendoring? I'm not sure what to do with the rapid joblib changes wrt to releasing 0.20.1
RandomForestRegressor
inGridSearchCV
uses more cores than specified.This bug was introduced in #11741. @ogrisel @tomMoral
The text was updated successfully, but these errors were encountered: