8000 Copy _make_estimator parameters by wyegelwel · Pull Request #12690 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Copy _make_estimator parameters #12690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

wyegelwel
Copy link

Reference Issues/PRs

Fixes #12623

What does this implement/fix? Explain your changes.

The issue is that when trees are created by the base ensemble class, the parameters are not copied. This is problematic in the case of the criterion parameter if a Criterion object is passed in. When this happens, all the trees share the same object and mutate it. If n_jobs > 1, they are mutating the same object concurrently.

Any other comments?

I couldn't think of a good test for this fix. Since the repro case causes a segfault it is hard to add a test case that doesn't cause grief.

@@ -124,7 +125,7 @@ def _make_estimator(self, append=True, random_state=None):
sub-estimators.
"""
estimator = clone(self.base_estimator_)
estimator.set_params(**dict((p, getattr(self, p))
estimator.set_params(**dict((p, copy.deepcopy(getattr(self, p)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense rather copy the parameters in the set_params

@amueller
Copy link
Member
amueller commented Aug 6, 2019

Well you can add a smoke test and if it segfaults we know that we messed up, right?

@jnothman
Copy link
Member

@wyegelwel do you intend to complete this?

@aadamson
Copy link
aadamson commented Jan 23, 2020

I'm running into the same issue and confirmed that the patch fixes it. What is left to get this over the line? Does it just need a smoke test? @amueller

@jnothman
Copy link
Member

@aadamson yes, it seems it needs a smoke test and a contributor. You're welcome to take it on

@aadamson
Copy link

Great, thanks. I'll take a look this week.

@cstefansen
Copy link
cstefansen commented Dec 9, 2020

@jnothman or @amueller: I ran into this today, and I am keen to land the fix. Could you say a bit more specifically about what smoke test you had in mind and where (what test file/dir) it should go? It doesn't seg fault every single time, so just want to be clear on what test you deem suitable.

@aadamson
Copy link
aadamson commented Dec 9, 2020

@cstefansen I had a branch that I never pushed with a smoke test that should cover this. See #18985.

@cmarmo cmarmo added the Superseded PR has been replace by a newer PR label Dec 18, 2020
Base automatically changed from master to main January 22, 2021 10:50
@cmarmo
Copy link
Contributor
cmarmo commented Mar 2, 2021

Closing as resolved in #19580

@cmarmo cmarmo closed this Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ensemble Needs work Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault when passing Criterion object to Forest ensembles with n_jobs>1
8 participants
0