-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Needs TriageIssue requires triageIssue requires triage
Description
Discussed in #30809
Originally posted by adosar February 11, 2025
In Controlling randomness, the guide is discussing how to properly control randomness either for an estimator or CV or when using both. However, there is no mention if random_state
and n_jobs > 1
interact in any unexpected way.
Lets consider a typical use case where a user cross validates a RandomForestClassifier
with KFold
:
estimator = RandomForestClassifer(random_state=np.random.RandomState(1)) # Recommended to pass RandomState instance.
kfold = KFold(shuffle=True, random_state=42) # Recommended to pass int.
cross_val_score(estimator, n_jobs=-1, ..., cv=kfold)
Since n_jobs=-1
this means that multiple cores will be used for cross validation (e.g. 1 core per fold).
Would the same state be used for the different folds, since during multiprocessing the estimator and hence the rng
passed to it, is copied via fork?
Metadata
Metadata
Assignees
Labels
Needs TriageIssue requires triageIssue requires triage