-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
n_jobs in GridSearchCV issue #6147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can reproduce the problem on my machine. Here is the code I used: import numpy as np
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import StratifiedKFold
from sklearn.datasets import make_classification
X_train, y_train = make_classification(n_samples=int(1e6), n_features=5, random_state=0)
model = ExtraTreesClassifier(class_weight='balanced')
parameters = {'criterion': ['gini', 'entropy'],
'max_depth': [4, 10, 20],
'min_samples_split' : [2, 4, 8],
'max_depth' : [3, 10, 20]}
clf = GridSearchCV(model, parameters, verbose=3, scoring='roc_auc',
cv=StratifiedKFold(y_train, n_folds=5, shuffle=True),
n_jobs=4) The data needs to be big enough to trigger the memmaping. |
@ogrisel FIY, the call to We are also having the same problem with random forest and logistic regression classifiers within our application. Inside of # We added this
if isinstance(score, np.core.memmap):
score = np.float(score)
if not isinstance(score, numbers.Number):
raise ValueError("scoring must return a number, got %s (%s) instead." Another bit of info, downgrading scikit to We are deciding if we should deploy the hack because right now our application depends on For what it's worth, we forked it and ran the unit tests locally and they passed. See the fork below by @nfcampos. |
@ogrisel this one also for 0.17.1? |
+1 working on it. |
Fixed in #6225, should be part of 0.17.1. |
* tag '0.17.1': (29 commits) Release 0.17.1 MAINT remove non-existing cache folder in 0.17.X branch FIX cythonize TSNE MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix Fix memory leak in Barnes-Hut SNE FIX check_build_doc.py false positive detections MAINT more informative output to circle/check_build_doc.py FIX fetch_california_housing FIX in randomized_svd flip sign Updated examples and tests that use scipy's lena DOC whats_new entry for scikit-learn#6258 fix joblib error in LatentDirichletAllocation MAINT fix / speedup travis on 0.17.X MAINT Upgrade pip in appveyor and display version DOC missing changelog entry for scikit-learn#5857 DOC add fix for scikit-learn#6147 to the changelog FIX 6147: ensure that AUC is always a float TST non-regression test for scikit-learn#6147, roc_auc on memmap data Added changelog entry about scikit-learn#6196 Fix reading of bunch pickles ...
* releases: (29 commits) Release 0.17.1 MAINT remove non-existing cache folder in 0.17.X branch FIX cythonize TSNE MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix Fix memory leak in Barnes-Hut SNE FIX check_build_doc.py false positive detections MAINT more informative output to circle/check_build_doc.py FIX fetch_california_housing FIX in randomized_svd flip sign Updated examples and tests that use scipy's lena DOC whats_new entry for scikit-learn#6258 fix joblib error in LatentDirichletAllocation MAINT fix / speedup travis on 0.17.X MAINT Upgrade pip in appveyor and display version DOC missing changelog entry for scikit-learn#5857 DOC add fix for scikit-learn#6147 to the changelog FIX 6147: ensure that AUC is always a float TST non-regression test for scikit-learn#6147, roc_auc on memmap data Added changelog entry about scikit-learn#6196 Fix reading of bunch pickles ...
* dfsg: (29 commits) Release 0.17.1 MAINT remove non-existing cache folder in 0.17.X branch FIX cythonize TSNE MAINT simplify freeing logic for Barnes-Hut SNE memory leak fix Fix memory leak in Barnes-Hut SNE FIX check_build_doc.py false positive detections MAINT more informative output to circle/check_build_doc.py FIX fetch_california_housing FIX in randomized_svd flip sign Updated examples and tests that use scipy's lena DOC whats_new entry for scikit-learn#6258 fix joblib error in LatentDirichletAllocation MAINT fix / speedup travis on 0.17.X MAINT Upgrade pip in appveyor and display version DOC missing changelog entry for scikit-learn#5857 DOC add fix for scikit-learn#6147 to the changelog FIX 6147: ensure that AUC is always a float TST non-regression test for scikit-learn#6147, roc_auc on memmap data Added changelog entry about scikit-learn#6196 Fix reading of bunch pickles ...
Issues with cross_val_score too ValueError: scoring must return a number, got 0.9762644725410562 (<class 'numpy.core.memmap.memmap'>) instead. sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError |
@MarkRConway what's your scikit-learn version? It should be fixed in 0.17.1. |
Yes, it is fixed. Thank you. |
This problem
8000
is back in 0.18, downgrading it to 0.17.1 resolves it. |
@eetuko can you please open a new issue and with code to reproduce. And did you use 0.18 or 0.18.1? |
scikit-learn==0.19 works well |
Hi,
First thanks for your awesome work !
I have an issue with GridSearchCV and n_jobs for a ExtraTreesClassifier model.
platform.platform()
: Linux-3.13.0-74-generic-x86_64-with-debian-jessie-sidcpu_count()
: 8sklearn.__version__
: '0.17'numpy.__version__
: '1.10.4'scipy.__version__
: '0.16.1'pandas.__version__
: '0.17.1'joblib.__version__
: '0.9.3'Code KO :
Sub-process traceback:
If I set my n_jobs model to 8 and n_jobs GridSearchCV to 1, it's OK
I try different setup but if GridSearchCV n_jobs > 1 it fails.
I would like to optimize my CPU and i think n_jobs > 1 on GridSearchCV it better than n_jobs on your model. Maybe someone has feedback ?
Possible relation with #6023
The text was updated successfully, but these errors were encountered: