8000 GridSearchCV with xgboost estimator hangs when n_jobs!=1 · Issue #6627 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

GridSearchCV with xgboost estimator hangs when n_jobs!=1 #6627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vzocca opened this issue Apr 5, 2016 · 16 comments
Closed

GridSearchCV with xgboost estimator hangs when n_jobs!=1 #6627

vzocca opened this issue Apr 5, 2016 · 16 comments
Labels
Documentation Easy Well-defined and straightforward way to resolve Sprint

Comments

@vzocca
Copy link
vzocca commented Apr 5, 2016

I don't know if this is related to #6147. I am using "The scikit-learn version is 0.18.dev0" and I have no exception though, so this is different.

In any case, this is my code (the data I am using is the same as the data for the Santander kaggle competition, too big to attach).

alg = XGBClassifier(max_depth=4, min_child_weight = 1, n_estimators=1000, learning_rate=0.0202, gamma=0, nthread=4, subsample=0.6815, colsample_bytree=0.701, seed=1, silent=False)

param_test1 = {
 'max_depth':range(3,10,2),
 'min_child_weight':range(1,10,2)
}

gsearch1 = GridSearchCV(estimator = alg, param_grid = param_test1, scoring='roc_auc', iid=False, n_jobs=4, cv=5)
gsearch1.fit(train_data[predictors].as_matrix(),train_data[target].as_matrix())

The program will not crash, will not throw an exception, but will not do anything (activity monitor shows no activity). Quick debugging shows the program enters _fit in grid_search.py but never reaches line 564. I did not debug further. A quick search brought me to issue #6147 and tried removing the n_jobsvariable.

Removingn_jobs from the GridSearchCV call solves the issue.

@lesteve
Copy link
Member
lesteve commented Apr 6, 2016

I don't know if this is related to #6147. I am using "The scikit-learn version is 0.18.dev0" and I have no exception though, so this is different.

This is very likely a completely different issue indeed.

Could you post the full code that you are using ? It looks like the Santander dataset from Kaggle is not that big so we could download it and see whether we can reproduce the problem. I am assuming this is the dataset you are talking about: https://www.kaggle.com/c/santander-customer-satisfaction/data.

@vzocca
Copy link
Author
vzocca commented Apr 6, 2016

Yes, that is the dataset.
There is not much more code besides what I posted. I do some data manipulation (removing constant columns and duplicates) but no more than that, and the issue will remain even if you don't.

So:

#load data
target = "TARGET"  
train_data = pandas.read_csv("Data/train.csv")      
test_data = pandas.read_csv("Data/test.csv")

predictors = test_data.columns.values.tolist()
predictors.remove("ID")

#define alg
alg = XGBClassifier(max_depth=4, min_child_weight = 1, n_estimators=1000, learning_rate=0.0202,     gamma=0, nthread=4, subsample=0.6815, colsample_bytree=0.701, seed=1)

#define the params range
param_test1 = {
     'max_depth':range(3,10,2),
     'min_child_weight':range(1,10,2)
    } 

#define the GridSearchCV using the alg defined above  
gsearch1 = GridSearchCV(estimator = alg, param_grid = param_test1, scoring='roc_auc', n_jobs=4,  iid=False, cv=5)

#fit using the train_data on the target values
gsearch1.fit(train_data[predictors].as_matrix(),train_data[target].as_matrix())
print gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_

As I mentioned I am using scikit-learn version is 0.18.dev0, don't know if this works with previous versions.

@lesteve
Copy link
Member
lesteve commented Apr 6, 2016

Great thanks ! General comment: the easier you make it to reproduce your problem the better the quality of feedback you'll get.

Minor comment you can use python after the triple backquotes to have syntax highlighting in your snippet, see this link.

@lesteve
Copy link
Member
lesteve commented Apr 6, 2016

OK I am guessing you are using xgboost and there may be a bad interaction going on between the xgboost thread pool and multiprocessing from the python stdlib. You can find a bit more details there: https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries

A few things you could try to see whether the issue goes away:

  • use nthread=1 in XGBoostClassifier
  • use Python 3 and set the joblib start method to forkserver as mentioned in the previous link

@vzocca
Copy link
Author
vzocca commented Apr 6, 2016

Thank you lesteve, I removed n_jobs from the GridSearchCV call and that fixed the issue. I might try what you suggest in the future but for now I am happy as it is. I just wanted to make you aware of this issue, and possibly help others who may encounter the same problem, suggesting a possible temporary solution. Thank you.

@lesteve
Copy link
Member
lesteve commented Apr 6, 2016

Would you be kind enough to just try whether setting nthread=1 in XGBoostClassifier fixes it as well ?

This would allow to be more confident that the source of the problem is understood. I don't have xgboost installed unfortunately and I would like to avoid spending too much time on this.

@lesteve
Copy link
Member
lesteve commented Apr 6, 2016

Probably relevant, from https://github.com/dmlc/xgboost/tree/master/python-package#note:

If you want to run XGBoost process in parallel using the fork backend for joblib/multiprocessing, you must build XGBoost without support for OpenMP by make no_omp=1. Otherwise, use the forkserver (in Python 3.4) or spawn backend. See the sklearn_parallel.py demo.

@vzocca
Copy link
Author
vzocca commented Apr 6, 2016

Of course, I'd be happy to help. Yes, that solves the issue as well (setting nthread=1). I don't have Python3 and at this point I'd rather not mess up with Python's installation. By the way, I am using MacOS and found issue #5115. Don't know enough about the system, could it be related?

@lesteve
Copy link
Member
lesteve commented Apr 6, 2016

Good to know that setting nthread=1 fixes it for you.

Don't worry about trying to test the solution proposed for Python 3. From the xgboost note mentioned above I am reasonably confident that it would work.

I am afraid you'll have to leave with the work-around for now (either setting n_jobs=1 or nthread=1). From what I understand this is a fundamental limitation of multiprocessing in python.

@vzocca
Copy link
Author
vzocca commented Apr 6, 2016

I am happy with the work-around. I just think it is useful to document it. Thank you for your time.

@raghavrv
Copy link
Member
raghavrv commented Apr 7, 2016

@lesteve should we close this? @TomDLT

@lesteve
Copy link
Member
lesteve commented Apr 8, 2016

@lesteve should we close this? @TomDLT

I think so.

@lesteve
Copy link
Member
lesteve commented Apr 8, 2016

If someone can edit the title to be something like "GridSearchCV with xgboost estimator hangs when n_jobs!=1" even better.

@vzocca
Copy link
Author
vzocca commented Apr 8, 2016

Is there a list of know issues? Should this be added there or add a note somewhere with explained why it cannot be solved?

@amueller
Copy link
Member
amueller commented Oct 5, 2016

We added something to the FAQ sometimes. Or we could actually have a list of known problems.

@amueller amueller added Easy Well-defined and straightforward way to resolve Documentation Need Contributor labels Oct 11, 2016
@amueller amueller added the Sprint label Mar 3, 2017
@QtRoS
Copy link
QtRoS commented Mar 8, 2017

I can confirm this!
Thanks for workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Easy Well-defined and straightforward way to resolve Sprint
Projects
None yet
Development

No branches or pull requests

5 participants
0