8000 sklearn.base.clone() should not make deep copy of all parameters · Issue #16137 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

sklearn.base.clone() should not make deep copy of all parameters #16137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
csbrown opened this issue Jan 16, 2020 · 11 comments
Closed

sklearn.base.clone() should not make deep copy of all parameters #16137

csbrown opened this issue Jan 16, 2020 · 11 comments
Labels

Comments

@csbrown
Copy link
csbrown commented Jan 16, 2020

duplicate #5563
related to #9696

Describe the workflow you want to enable

sklearn.base.clone does not raise a RunTimeError when the constructor alters the model parameters

Describe your proposed solution

Delete lines 75-82 in sklearn/base.py

Describe alternatives you've considered, if relevant

I currently alter parameters in other methods, and there is no complaint. Since lines 75-82 don't prevent me from altering the parameters, and they force me to hide that functionality in other methods, they are counterproductive.

It is a normal python practice to set defaults by accepting None as a parameter, and then if the value is None, to set the parameter to be some other thing. It is not clear what is gained by preventing this behavior. Specifically, sklearn exposes interfaces that are very helpful when building pipelines with my own custom models. This particular behavior makes building my custom modules hard.

@jnothman
Copy link
Member
jnothman commented Jan 16, 2020 via email

@glemaitre
Copy link
Member

Basically, this is an API choice and we have a lot of code that relies on the current API. Rephrasing @jnothman comment, making such changes will alter the way set_params will work and affect the way GridSearchCV is working (cf. https://scikit-learn.org/dev/developers/develop.html#instantiation).

So basically, your estimator should not pass the check_estimator or if it does, we should restrict this usage and have a check in check_estimator function.

In the short-term, I don't think that we can change anything about it.

In the long-term, one could write a SLEP finding an alternative pattern that provides a much better design without degrading the performance.

@imuhammadshadab

This comment has been minimized.

@csbrown
Copy link
Author
csbrown commented Jan 17, 2020

@glemaitre I don't see how this would alter any current functionality, since the esimators in the sklearn library already adhere to the desired API. Note that I am not suggesting that code be inserted into, say, smv.SVC that changes parameters in the initializer. I am saying that EAFP is the ordinary python paradigm, and that this particular bit of code does not even successfully enforce the API constraints you want to enforce.

@glemaitre

This comment has been minimized.

@glemaitre
Copy link
Member

I don't see how this would alter any current functionality, since the esimators in the sklearn library already adhere to the desired API.

If change sklearn.base.clone, we don't have any safety guard ensuring that we follow our own developer API. Now, we are raising an error at least. And if a third-library wants to make a compatible estimator which should work in GridSearchCV, they will need something that does not raise this error.

@csbrown
Copy link
Author
csbrown commented Jan 17, 2020

To be clear, I often have to implement 8000 models in the following way, specifically so that they will function correctly with GridSearchCV:

class MyModel(object):
    def __init__(self, alpha = 0.01, some_param=None):
        self.alpha = alpha
        self.some_param = some_param
    def __init_more__(self):
        if self.some_param is None:
            pass #actually initialize some_param

model = MyModel()
model.__init_more__()
GridSearchCV(model, {"alpha": (0.01, 0.05)})

This usually happens when I have some very resource-heavy item that MyModel needs access to, such as a tensorflow graph that I want all copies of MyModel to share. If the validation code in sklearn.base.clone were removed, this would be cleaner. I find it dubious that anyone currently relies on the validation code (although they may rely on the API, which will remain unchanged).

@csbrown
Copy link
Author
csbrown commented Jan 17, 2020

@glemaitre perhaps a warning would be more appropriate?

@glemaitre
Copy link
Member

I think what is more appropriate is to have a deepcopy parameter in clone to trigger or not copies of parameters.

I think that we have an issue about this in the past: #5563. I started at some point a PR #9696 as well, which should be rebased and handle properly the cloning in the GridSearchCV.

Then the assertion in clone will happen only when a deepcopy is required.

@glemaitre glemaitre reopened this Jan 17, 2020
@glemaitre
Copy link
Member

I reopen the issue and tag it as a duplicate.

@glemaitre glemaitre added API and removed New Feature labels Jan 17, 2020
@glemaitre glemaitre changed the title Why does clone require that the parameters are not altered by the constructor? sklearn.base.clone() should not make deep copy of all parameters Jan 17, 2020
@adrinjalali
Copy link
Member

closing for the same reason as #5563 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
0