-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] DOC Correct default n_jobs & reference the glossary #11808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments below, otherwise LGTM.
sklearn/cluster/bicluster.py
Outdated
The number of jobs to use for the computation. This works by breaking | ||
down the pairwise matrix into n_jobs even slices and computing them in | ||
parallel. | ||
|
||
If ``None``, then the number of jobs is set to 1 unless current joblib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove "then" and possibly "unless the current"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant: possibly add "the" in "unless current"..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth So you mean something like: If None
, the number of jobs is set to 1 unless the current joblib backend context specifies otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but let's wait for @jnothman 's opinion before propagating these changes everywhere..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is okay, but seems very verbose. This seems sufficient:
"Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means use all processors. See the Glossary for more."
Number of jobs to run in parallel (default 1). | ||
n_jobs : int or None, optional (default=None) | ||
Number of jobs to run in parallel. | ||
``None`` means 1 unless in a ``joblib.parallel_backend`` context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use :obj:`joblib.parallel_backend`
to create an intersphinx link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we would keep people happier by retaining "If 1, no job-level parallelism is used, which is useful for debugging." But ... whatever.
n_jobs : int, | ||
number of parallel jobs to run | ||
n_jobs : int or None, optional (default=None) | ||
number of parallel jobs to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit awkward to keep this lowercase
@@ -2186,10 +2191,11 @@ class MultiTaskLassoCV(LinearModelCV, RegressorMixin): | |||
verbose : bool or integer | |||
Amount of verbosity. | |||
|
|||
n_jobs : integer, optional | |||
Number of CPUs to use during the cross validation. If ``-1``, use | |||
all the CPUs. Note that this is used only if multiple values for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missed this note.
@@ -2014,10 +2018,11 @@ class MultiTaskElasticNetCV(LinearModelCV, RegressorMixin): | |||
verbose : bool or integer | |||
Amount of verbosity. | |||
|
|||
n_jobs : integer, optional | |||
Number of CPUs to use during the cross validation. If ``-1``, use | |||
all the CPUs. Note that this is used only if multiple values for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missed this note
sklearn/linear_model/base.py
Outdated
The number of jobs to use for the computation. | ||
If -1 all CPUs are used. This will only provide speedup for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missed this note
@jnothman I add it to the glossary. I'll make similar changes everywhere if you decide to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise LGTM
@@ -1479,8 +1479,9 @@ functions or non-estimator constructors. | |||
|
|||
``n_jobs`` is an int, specifying the maximum number of concurrently | |||
running jobs. If set to -1, all CPUs are used. If 1 is given, no | |||
parallel computing code is used at all. For n_jobs below -1, (n_cpus + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement already here is not quite true. Better to say no job-level parallelism or something, and perhaps to note that other parallelism may be done by the numerical processing libraries (as per the FAQ's coverage of this)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment below, otherwise LGTM.
doc/glossary.rst
Outdated
@@ -1479,8 +1479,9 @@ functions or non-estimator constructors. | |||
|
|||
``n_jobs`` is an int, specifying the maximum number of concurrently | |||
running jobs. If set to -1, all CPUs are used. If 1 is given, no | |||
parallel computing code is used at all. For n_jobs below -1, (n_cpus + | |||
1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. | |||
job-level parallel computing code is used at all, which is useful for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "no joblib level parallelism is used at all".
I also think it's important to emphasize that other parallelism may be used as mentionned above with a link to the FAQ, as a number of users don't seem to be aware of it.
Possibly,
"""
Even with n_jobs = 1, parallelism may occur be due to numerical processing libraries (see FAQ#section_on_parallelism).
"""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I'm not aware of it.
This version is a bit duplicate with the first paragraph, but I think it's beneficial to emphasize the meaning of n_jobs=1
.
Thanks @qinhanmin2014 |
part of #11771
Correct default n_jobs (n_jobs=None is not the same as n_jobs=1) and reference the glossary.
ping @jnothman @rth this is my solution. If you like it, I'll make similar changes in the whole repo.