8000 [MRG+2] DOC Correct default n_jobs & reference the glossary by qinhanmin2014 · Pull Request #11808 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] DOC Correct default n_jobs & reference the glossary #11808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 18, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions doc/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1479,8 +1479,11 @@ functions or non-estimator constructors.

``n_jobs`` is an int, specifying the maximum number of concurrently
running jobs. If set to -1, all CPUs are used. If 1 is given, no
parallel computing code is used at all. For n_jobs below -1, (n_cpus +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement already here is not quite true. Better to say no job-level parallelism or something, and perhaps to note that other parallelism may be done by the numerical processing libraries (as per the FAQ's coverage of this)...

1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
joblib level parallelism is used at all, which is useful for
debugging. Even with ``n_jobs = 1``, parallelism may occur due to
numerical processing libraries (see :ref:`FAQ <faq_mkl_threading>`).
For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for
``n_jobs = -2``, all CPUs but one are used.

``n_jobs=None`` means *unset*; it will generally be interpreted as
``n_jobs=1``, unless the current :class:`joblib.Parallel` backend
Expand Down
9 changes: 6 additions & 3 deletions examples/model_selection/plot_learning_curve.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@


def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
n_jobs=1, train_sizes=np.linspace(.1, 1.0, 5)):
n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)):
"""
Generate a simple plot of the test and training learning curve.

Expand Down Expand Up @@ -63,8 +63,11 @@ def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
Refer :ref:`User Guide <cross_validation>` for the various
cross-validators that can be used here.

n_jobs : integer, optional
Number of jobs to run in parallel (default 1).
n_jobs : int or None, optional (default=None)
Number of jobs to run in parallel.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

train_sizes : array-like, shape (n_ticks,), dtype float or int
Relative or absolute numbers of training examples that will be used to
Expand Down
18 changes: 8 additions & 10 deletions sklearn/cluster/bicluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,15 +228,14 @@ class SpectralCoclustering(BaseSpectral):
chosen and the algorithm runs once. Otherwise, the algorithm
is run for each initialization and the best solution chosen.

n_jobs : int, optional, default: 1
n_jobs : int or None, optional (default=None)
The number of jobs to use for the computation. This works by breaking
down the pairwise matrix into n_jobs even slices and computing them in
parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

random_state : int, RandomState instance or None (default)
Used for randomizing the singular value decomposition and the k-means
Expand Down Expand Up @@ -375,15 +374,14 @@ class SpectralBiclustering(BaseSpectral):
chosen and the algorithm runs once. Otherwise, the algorithm
is run for each initialization and the best solution chosen.

n_jobs : int, optional, default: 1
n_jobs : int or None, optional (default=None)
The number of jobs to use for the computation. This works by breaking
down the pairwise matrix into n_jobs even slices and computing them in
parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

random_state : int, RandomState instance or None (default)
Used for randomizing the singular value decomposition and the k-means
Expand Down
12 changes: 8 additions & 4 deletions sklearn/cluster/dbscan_.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,11 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None,
weight may inhibit its eps-neighbor from being core.
Note that weights are absolute, and default to 1.

n_jobs : int, optional (default = 1)
n_jobs : int or None, optional (default=None)
The number of parallel jobs to run for neighbors search.
If ``-1``, then the number of jobs is set to the number of CPU cores.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Returns
-------
Expand Down Expand Up @@ -229,9 +231,11 @@ class DBSCAN(BaseEstimator, ClusterMixin):
The power of the Minkowski metric to be used to calculate distance
between points.

n_jobs : int, optional (default = 1)
n_jobs : int or None, optional (default=None)
The number of parallel jobs to run.
If ``-1``, then the number of jobs is set to the number of CPU cores.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Attributes
----------
Expand Down
18 changes: 8 additions & 10 deletions sklearn/cluster/k_means_.py
10000
Original file line number Diff line number Diff line change
Expand Up @@ -261,14 +261,13 @@ def k_means(X, n_clusters, sample_weight=None, init='k-means++',
the data mean, in this case it will also not ensure that data is
C-contiguous which may cause a significant slowdown.

n_jobs : int
n_jobs : int or None, optional (default=None)
The number of jobs to use for the computation. This works by computing
each of the n_init runs in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

algorithm : "auto", "full" or "elkan", default="auto"
K-means algorithm to use. The classical EM-style algorithm is "full".
Expand Down Expand Up @@ -834,14 +833,13 @@ class KMeans(BaseEstimator, ClusterMixin, TransformerMixin):
the data mean, in this case it will also not ensure that data is
C-contiguous which may cause a significant slowdown.

n_jobs : int
n_jobs : int or None, optional (default=None)
The number of jobs to use for the computation. This works by computing
each of the n_init runs in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

algorithm : "auto", "full" or "elkan", default="auto"
K-means algorithm to use. The classical EM-style algorithm is "full".
Expand Down
24 changes: 12 additions & 12 deletions sklearn/cluster/mean_shift_.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,11 @@ def estimate_bandwidth(X, quantile=0.3, n_samples=None, random_state=0,
deterministic.
See :term:`Glossary <random_state>`.

n_jobs : int, optional (default = 1)
n_jobs : int or None, optional (default=None)
The number of parallel jobs to run for neighbors search.
If ``-1``, then the number of jobs is set to the number of CPU cores.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Returns
-------
Expand Down Expand Up @@ -152,14 +154,13 @@ def mean_shift(X, bandwidth=None, seeds=None, bin_seeding=False,
Maximum number of iterations, per seed point before the clustering
operation terminates (for that seed point), if has not converged yet.

n_jobs : int
n_jobs : int or None, optional (default=None)
The number of jobs to use for the computation. This works by computing
each of the n_init runs in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

.. versionadded:: 0.17
Parallel Execution using *n_jobs*.
Expand Down Expand Up @@ -334,14 +335,13 @@ class MeanShift(BaseEstimator, ClusterMixin):
not within any kernel. Orphans are assigned to the nearest kernel.
If false, then orphans are given cluster label -1.

n_jobs : int
n_jobs : int or None, optional (default=None)
The number of jobs to use for the computation. This works by computing
each of the n_init runs in parallel.

If -1 all CPUs are used. If 1 is given, no parallel computing code is
used at all, which is useful for debugging. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one
are used.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Attributes
----------
Expand Down
12 changes: 8 additions & 4 deletions sklearn/cluster/optics_.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,11 @@ def optics(X, min_samples=5, max_bound=np.inf, metric='euclidean',
required to store the tree. The optimal value depends on the
nature of the problem.

n_jobs : int, optional (default=1)
n_jobs : int or None, optional (default=None)
The number of parallel jobs to run for neighbors search.
If ``-1``, then the number of jobs is set to the number of CPU cores.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Returns
-------
Expand Down Expand Up @@ -243,9 +245,11 @@ class OPTICS(BaseEstimator, ClusterMixin):
required to store the tree. The optimal value depends on the
nature of the problem.

n_jobs : int, optional (default=1)
n_jobs : int or None, optional (default=None)
The number of parallel jobs to run for neighbors search.
If ``-1``, then the number of jobs is set to the number of CPU cores.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Attributes
----------
Expand Down
6 changes: 4 additions & 2 deletions sklearn/cluster/spectral.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,9 +358,11 @@ class SpectralClustering(BaseEstimator, ClusterMixin):
Parameters (keyword arguments) and values for kernel passed as
callable object. Ignored by other kernels.

n_jobs : int, optional (default = 1)
n_jobs : int or None, optional (default=None)
The number of parallel jobs to run.
If ``-1``, then the number of jobs is set to the number of CPU cores.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Attributes
----------
Expand Down
14 changes: 10 additions & 4 deletions sklearn/compose/_column_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,11 @@ class ColumnTransformer(_BaseComposition, TransformerMixin):
the stacked result will be sparse or dense, respectively, and this
keyword will be ignored.

n_jobs : int, optional
Number of jobs to run in parallel (default 1).
n_jobs : int or None, optional (default=None)
Number of jobs to run in parallel.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

transformer_weights : dict, optional
Multiplicative weights for features per transformer. The output of the
Expand Down Expand Up @@ -666,8 +669,11 @@ def make_column_transformer(*transformers, **kwargs):
non-specified columns will use the ``remainder`` estimator. The
estimator must support `fit` and `transform`.

n_jobs : int, optional
Number of jobs to run in parallel (default 1).
n_jobs : int or None, optional (default=None)
Number of jobs to run in parallel.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

Returns
-------
Expand Down
14 changes: 10 additions & 4 deletions sklearn/covariance/graph_lasso_.py
Original file line number Diff line number Diff line change
Expand Up @@ -520,8 +520,11 @@ class GraphicalLassoCV(GraphicalLasso):
than number of samples. Elsewhere prefer cd which is more numerically
stable.

n_jobs : int, optional
number of jobs to run in parallel (default 1).
n_jobs : int or None, optional (default=None)
number of jobs to run in parallel.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

verbose : boolean, optional
If verbose is True, the objective function and duality gap are
Expand Down Expand Up @@ -927,8 +930,11 @@ class GraphLassoCV(GraphicalLassoCV):
than number of samples. Elsewhere prefer cd which is more numerically
stable.

n_jobs : int, optional
number of jobs to run in parallel (default 1).
n_jobs : int or None, optional (default=None)
number of jobs to run in parallel.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

verbose : boolean, optional
If verbose is True, the objective function and duality gap are
Expand Down
40 changes: 29 additions & 11 deletions sklearn/decomposition/dict_learning.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,8 +246,11 @@ def sparse_encode(X, dictionary, gram=None, cov=None, algorithm='lasso_lars',
max_iter : int, 1000 by default
Maximum number of iterations to perform if `algorithm='lasso_cd'`.

n_jobs : int, optional
n_jobs : int or None, optional (default=None)
Number of parallel jobs to run.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

check_input : boolean, optional
If False, the input arrays X and dictionary will not be checked.
Expand Down Expand Up @@ -459,8 +462,11 @@ def dict_learning(X, n_components, alpha, max_iter=100, tol=1e-8,
Lasso solution (linear_model.Lasso). Lars will be faster if
the estimated components are sparse.

n_jobs : int,
Number of parallel jobs to run, or -1 to autodetect.
n_jobs : int or None, optional (default=None)
Number of parallel jobs to run.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

dict_init : array of shape (n_components, n_features),
Initial value for the dictionary for warm restart scenarios.
Expand Down Expand Up @@ -654,8 +660,11 @@ def dict_learning_online(X, n_components=2, alpha=1, n_iter=100,
shuffle : boolean,
Whether to shuffle the data before splitting it in batches.

n_jobs : int,
Number of parallel jobs to run, or -1 to autodetect.
n_jobs : int or None, optional (default=None)
Number of parallel jobs to run.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

method : {'lars', 'cd'}
lars: uses the least angle regression method to solve the lasso problem
Expand Down Expand Up @@ -949,8 +958,11 @@ class SparseCoder(BaseEstimator, SparseCodingMixin):
its negative part and its positive part. This can improve the
performance of downstream classifiers.

n_jobs : int,
number of parallel jobs to run
n_jobs : int or None, optional (default=None)
Number of parallel jobs to run.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

positive_code : bool
Whether to enforce positivity when finding the code.
Expand Down Expand Up @@ -1069,8 +1081,11 @@ class DictionaryLearning(BaseEstimator, SparseCodingMixin):
the reconstruction error targeted. In this case, it overrides
`n_nonzero_coefs`.

n_jobs : int,
number of parallel jobs to run
n_jobs : int or None, optional (default=None)
Number of parallel jobs to run.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

code_init : array of shape (n_samples, n_components),
initial value for the code, for warm restart
Expand Down Expand Up @@ -1220,8 +1235,11 @@ class MiniBatchDictionaryLearning(BaseEstimator, SparseCodingMixin):
Lasso solution (linear_model.Lasso). Lars will be faster if
the estimated components are sparse.

n_jobs : int,
number of parallel jobs to run
n_jobs : int or None, optional (default=None)
Number of parallel jobs to run.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

batch_size : int,
number of samples in each mini-batch
Expand Down
6 changes: 4 additions & 2 deletions sklearn/decomposition/kernel_pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,11 @@ class KernelPCA(BaseEstimator, TransformerMixin):

.. versionadded:: 0.18

n_jobs : int, default=1
n_jobs : int or None, optional (default=None)
The number of parallel jobs to run.
If `-1`, then the number of jobs is set to the number of CPU cores.
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details.

.. versionadded:: 0.18

Expand Down
Loading
0