Description
By default many model selection tools such as cross_validate
, validation_curves
and *SearchCV
catch exceptions, raise a warning and score the model with nan
. This can be a nice behavior, especially for *SearchCV
estimators that can explore invalid hyper-parameter combinations.
However the warnings can be hidden when they are issued on the stderr of loky workers (python subprocesses), for instance in Jupyter interactive environments:
This is a big usability bug. I think we should refactor the model selection tools to raise the warning from the main process instead of the workers. This would make it possible to:
- make it possible to display the warning on the main process stderr to avoid hiding those when parallel compution on several Python processes (e.g. with the default loky backend of joblib) or multi-machine (e.g. with the dask or ray cluster backends of joblib).
- raise the warning only once (for the first case) to avoid a "wall of warnings"-effect when
n_jobs=1
- inform the user that can set
error_score="raise"
in the warning message if they want there code to raise an exception instead of getttingnan
-valued scores.
This problem has been causing a lot of confusion to MOOC participants (INRIA/scikit-learn-mooc#377) so it's probably hurting scikit-learn usability significantly.