error_score=nan issues hidden warnings in model selection utilities when n_jobs>1

By default many model selection tools such as cross_validate, validation_curves and *SearchCV catch exceptions, raise a warning and score the model with nan. This can be a nice behavior, especially for *SearchCV estimators that can explore invalid hyper-parameter combinations.

However the warnings can be hidden when they are issued on the stderr of loky workers (python subprocesses), for instance in Jupyter interactive environments:

This is a big usability bug. I think we should refactor the model selection tools to raise the warning from the main process instead of the workers. This would make it possible to:

make it possible to display the warning on the main process stderr to avoid hiding those when parallel compution on several Python processes (e.g. with the default loky backend of joblib) or multi-machine (e.g. with the dask or ray cluster backends of joblib).
raise the warning only once (for the first case) to avoid a "wall of warnings"-effect when n_jobs=1
inform the user that can set error_score="raise" in the warning message if they want there code to raise an exception instead of gettting nan-valued scores.

This problem has been causing a lot of confusion to MOOC participants (INRIA/scikit-learn-mooc#377) so it's probably hurting scikit-learn usability significantly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions