-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX make deepcopy of parameters in ensemble to avoid sharing mutable instances #18985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1377,3 +1377,22 @@ def test_little_tree_with_small_max_samples(ForestClass): | |||||||||||||||||
|
||||||||||||||||||
msg = "Tree without `max_samples` restriction should have more nodes" | ||||||||||||||||||
assert tree1.node_count > tree2.node_count, msg | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
@pytest.mark.parametrize('Forest', FOREST_REGRESSORS) | ||||||||||||||||||
def test_mse_criterion_object_segfault_smoke_test(Forest): | ||||||||||||||||||
# Ensure that we can pass a mutable criterion while using parallel fit | ||||||||||||||||||
# Non-regression test for: | ||||||||||||||||||
# https://github.com/scikit-learn/scikit-learn/issues/12623 | ||||||||||||||||||
from sklearn.tree._classes import CRITERIA_REG | ||||||||||||||||||
|
||||||||||||||||||
X = np.random.random((1000, 3)) | ||||||||||||||||||
y = np.random.random((1000, 1)) | ||||||||||||||||||
|
||||||||||||||||||
n_samples, n_outputs = y.shape | ||||||||||||||||||
mse_criterion = CRITERIA_REG['mse'](n_outputs, n_samples) | ||||||||||||||||||
est = FOREST_REGRESSORS[Forest]( | ||||||||||||||||||
n_estimators=2, n_jobs=2, criterion=mse_criterion | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If WDYT @glemaitre ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @thomasjpfan - I'm resuming this for Alex and was wondering what you think we should do next. If I remember correctly, I think since the underlying issue is specific to a certain presumption of thread-safety when instantiating estimators we all settled on fixing it within There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking the deepcopy may be costly for certain objects and a more targeted solution would be to copy in scikit-learn/sklearn/tree/_classes.py Lines 344 to 351 in 94abe05
if not isinstance(criterion, Criterion):
...
else:
criterion = copy.deepcopy(criterion) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would work for us, although I'm not familiar enough with the rest of the ensemble implementation to know if it categorically covers all potential multi-threading issues. I will update this fix and create a PR asap. Thanks again for looking into this. |
||||||||||||||||||
) | ||||||||||||||||||
|
||||||||||||||||||
est.fit(X, y) |
Uh oh!
There was an error while loading. Please reload this page.