8000 RandomForestRegressor in GridSearchCV uses more cores than specified · Issue #12289 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

RandomForestRegressor in GridSearchCV uses more cores than specified #12289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomDLT opened this issue Oct 4, 2018 · 4 comments
Closed

RandomForestRegressor in GridSearchCV uses more cores than specified #12289

TomDLT opened this issue Oct 4, 2018 · 4 comments
Labels
Milestone

Comments

@TomDLT
Copy link
Member
TomDLT commented Oct 4, 2018

RandomForestRegressor in GridSearchCV uses more cores than specified.

import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV

n_samples, n_features = 10000, 10
X = np.random.randn(n_samples, n_features)
y = np.random.randn(n_samples)

grid = {'n_estimators': [100, 200]}

# only one core is used (GOOD)
rfr = RandomForestRegressor(n_estimators=100)
rfr.fit(X, y)

# only two cores are used (GOOD)
rfr = RandomForestRegressor(n_estimators=100, n_jobs=1)
gsc = GridSearchCV(rfr, grid, n_jobs=2)
gsc.fit(X, y)

# more than two cores are used (BUG)
rfr = RandomForestRegressor(n_estimators=100)
gsc = GridSearchCV(rfr, grid, n_jobs=2)
gsc.fit(X, y)

This bug was introduced in #11741. @ogrisel @tomMoral

@TomDLT TomDLT added the Bug label Oct 4, 2018
@TomDLT TomDLT added this to the 0.20.1 milestone Oct 4, 2018
@tomMoral
Copy link
Contributor
tomMoral commented Oct 5, 2018

This is a bug from joblib indeed. It has to do with the prefer='threads'.

The RandomForestRegressor.fit method is called with a ThreadingBackend and n_jobs=-1, as the parameter prefer is passed to the Parallel instance. The RandomForestregressor.predict method is called using the SequentialBackend and the difference between these two calls is that the parameter prefer is not passed here.

I will further investigate and try to come up with a fix in joblib.

@tomMoral
Copy link
Contributor
tomMoral commented Oct 5, 2018

I opened an issue in joblib/joblib#784 and a PR in joblib/joblib#785

@amueller
Copy link
Member

so we close this? Or wait for new joblib to be released / merged to scikit-learn? or wait for unvendoring? I'm not sure what to do with the rapid joblib changes wrt to releasing 0.20.1

@ogrisel
Copy link
Member
ogrisel commented Nov 7, 2018

This was fixed by merging joblib 0.13.0 (I have just tried all the cases reported and by @TomDLT's snippet and they behave as expected).

@ogrisel ogrisel closed this as completed Nov 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants
0