GridSearchCV cannot be paralleled when custom scoring is used

Hi,

I met a problem with the code:

    from sklearn.model_selection import GridSearchCV
    model = ensemble.RandomForestRegressor()
    param = {'n_estimators': [500, 700, 1200],
             'max_depth': [3, 5, 7],
             'max_features': ['auto'],
             'n_jobs': [-1],
             'criterion': ['mae', 'mse'],
             'random_state': [300],
             }
    from sklearn.metrics import make_scorer
    def my_custom_loss_func(ground_truth, predictions):
        diff = np.abs(ground_truth - predictions) / ground_truth
        return np.mean(diff)
    loss = make_scorer(my_custom_loss_func, greater_is_better=False)
    model_cv = GridSearchCV(model, param, cv=5, n_jobs=2, scoring=loss, verbose=1)
    model_cv.fit(X, y.ravel())

in which I used custom scoring object in GridSearchCV(...) and set n_jobs = 2.

I got the following error message:

C:\Anaconda3\python.exe C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py
Fitting 5 folds for each of 18 candidates, totalling 90 fits
Traceback (most recent call last):
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 172, in <module>
    models, scas = learn_all(X_train, y_train)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 108, in learn_all
    models[machine], scas[machine] = learn_cv(X, y)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 87, in learn_cv
    model_cv.fit(X, y.ravel())
  File "C:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 638, in fit
    cv.split(X, y, groups)))
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in __call__
    self.retrieve()
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 608, in get
    raise self._value
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 385, in _handle_tasks
    put(task)
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\pool.py", line 371, in send
    CustomizablePickler(buffer, self._reducers).dump(obj)
AttributeError: Can't pickle local object 'learn_cv.<locals>.my_custom_loss_func'

Process finished with exit code 1

It seems that if and only if n_jobs is set to 1 can the program be run.

Any ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions