8000 `error_score=nan` issues hidden warnings in model selection utilities when n_jobs>1 · Issue #20475 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
error_score=nan issues hidden warnings in model selection utilities when n_jobs>1 #20475
Closed
@ogrisel

Description

@ogrisel

By default many model selection tools such as cross_validate, validation_curves and *SearchCV catch exceptions, raise a warning and score the model with nan. This can be a nice behavior, especially for *SearchCV estimators that can explore invalid hyper-parameter combinations.

However the warnings can be hidden when they are issued on the stderr of loky workers (python subprocesses), for instance in Jupyter interactive environments:

image

This is a big usability bug. I think we should refactor the model selection tools to raise the warning from the main process instead of the workers. This would make it possible to:

  • make it possible to display the warning on the main process stderr to avoid hiding those when parallel compution on several Python processes (e.g. with the default loky backend of joblib) or multi-machine (e.g. with the dask or ray cluster backends of joblib).
  • raise the warning only once (for the first case) to avoid a "wall of warnings"-effect when n_jobs=1
  • inform the user that can set error_score="raise" in the warning message if they want there code to raise an exception instead of gettting nan-valued scores.

This problem has been causing a lot of confusion to MOOC participants (INRIA/scikit-learn-mooc#377) so it's probably hurting scikit-learn usability significantly.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0