You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Avoid ValueError in parallel computing of large arrays
This PR introduces the optional *max_nbytes* parameter on
*OneVsRestClassifier", OneVsOneClassifier" and OutputCodeClassifier"
multiclass learning algorithms within *multiclass.py*.
Such parameter is in addition to the already existing *n_jobs* one and
might be useful when dealing with a large training set processed by
concurrently running jobs defined by *n_jobs* > 0 or by *n_jobs* = -1
(meaning that the number of jobs is set to the number of CPU cores). In
this case,
[Parallel](https://joblib.readthedocs.io/en/latest/parallel.html#parallel-reference-documentation)
is called with the default "loky" backend, that [implements
multi-processing](https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism);
*Parallel* also sets a default 1-megabyte
[threshold](https://joblib.readthedocs.io/en/latest/parallel.html#automated-array-to-memmap-conversion)
on the size of arrays passed to the workers. Such parameter may not be
enough for large arrays and could break the job with exception
**ValueError: UPDATEIFCOPY base is read-only**. *Parallel* uses
*max_nbytes* to control this threshold. Through this fix, the multiclass
classifiers will offer the optional possibility to customize the max
size of arrays.
Fixesscikit-learn#6614
Expected to also fixscikit-learn#4597
0 commit comments