Description
Before adding OpenMP based parallelism we need to decide how to control the number of threads and how to expose it in the public API.
I've seen several proposition from different people:
(1) Use the existing n_jobs
public parameter with None
means 1 (same a for joblib parallelism)
(2) Use the existing n_jobs
public parameter with None
means -1 (like numpy lets BLAS use as many threads as possible)
(3) Add a new public parameter n_omp_threads
when underlying parallelism is handled by OpenMP, with None
means 1.
(4) Add a new public parameter n_omp_threads
when underlying parallelism is handled by OpenMP, with None
means -1.
(5) Do not expose that in the public API. Use as many threads as possible. The user can still have some control with OMP_NUM_THREADS
before runtime or using threadpoolctl at runtime.
(1) or (2) will require improving documentation of n_jobs
for each estimator: what's the default, what kind of parallelism, what is done in parallel... (see #14228)
@scikit-learn/core-devs, which solution do you prefer ?
If it's none of the previous ones, what's your solution ?