8000 Are there any pitfalls by combining `n_jobs` and `random_state`? · Issue #30811 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Are there any pitfalls by combining n_jobs and random_state? #30811
@adosar

Description

@adosar

Discussed in #30809

Originally posted by adosar February 11, 2025
In Controlling randomness, the guide is discussing how to properly control randomness either for an estimator or CV or when using both. However, there is no mention if random_state and n_jobs > 1 interact in any unexpected way.

Lets consider a typical use case where a user cross validates a RandomForestClassifier with KFold:

estimator = RandomForestClassifer(random_state=np.random.RandomState(1))  # Recommended to pass RandomState instance.
kfold = KFold(shuffle=True, random_state=42)  # Recommended to pass int.
cross_val_score(estimator, n_jobs=-1, ..., cv=kfold)

Since n_jobs=-1 this means that multiple cores will be used for cross validation (e.g. 1 core per fold).

Would the same state be used for the different folds, since during multiprocessing the estimator and hence the rng passed to it, is copied via fork?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0