-
-
Notifications
You must be signed in to change notification settings - Fork 34
SLEP017: clone override #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking at other libraries, I think the __sklearn_is_fitted__
is has been well received since the semantics of that is simple.
I think the current design of clone has put us in a "design corner" where everything we want to clone must go into __init__
. This __sklearn_clone__
proposal is the most flexibility in terms of what it can do.
A much less flexible alternative is a __sklearn_clone_parameters__
, which returns a list of additional attributes for clone
to copy. This would enable the metadata requests and the parameter space use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy for this to be merged and move to a final discussion/vote.
Alternative solutions include continuing to force developers into providing | ||
sometimes-awkward constructor parameters for any clonable material, and | ||
Scikit-learn core developers having the exceptional ability to extend | ||
the ``clone`` function as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the sarcasm lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge this as a draft.
In the meantime, I opened scikit-learn/scikit-learn#24568 as an implementation for this SLEP.
@jnothman wanna merge and call for a vote? |
I am not sure but because of linked issues it may close one or more of the following: |
- :issue:`8370`, :issue:`9464`: generic estimator wrapper for model freezing | ||
- :issue:`5082`: configuring parameter search spaces | ||
- :issue:`16079`: configuring the routing of sample-aligned metadata | ||
- :issue:`16185`: configuring selected parameters to not be deep-copied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we can deprecate pipeline finally because we have a replacement.
A centralised implementation of :func:`sklearn.base.clone` regards | ||
an estimator's constructor parameters as the state that should be copied. | ||
This proposal allows for an estimator class to implment custom cloning | ||
functionality with a ``__sklearn_clone__`` method, which will default to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never know if it's "a sklearn" or "an sklearn", I tend to favor "an sklearn" but you're the native speaker :)
setting and construction in Scikit-learn. | ||
In this vein, objections to :issue:`5080` cited the notion that "``clone`` | ||
has a simple contract," and that "extension to it would open the door to | ||
violations of that contract" [2]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add to that, that currently the content of any dictionary is deepcopied, so you can work around any convention by adding a dictionary to the constructor and then populating that dictionary in fit.
i.e.
from sklearn.base import BaseEstimator, ClassifierMixin, clone
class ThereIsNoGodOnlyTheFreezer(ClassifierMixin, BaseEstimator):
def __init__(self, estimator_to_freeze, dictionary_to_deepcopy=None):
self.estimator_to_freeze = estimator_to_freeze
self.dictionary_to_deepcopy = dictionary_to_deepcopy or {'frozen_estimator': estimator_to_freeze}
def fit(self, X, y):
return self
def predict(self, X):
return self.dictionary_to_deepcopy['frozen_estimator'].predict(X)
Then
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
X, y = load_iris(return_X_y=True)
dt = DecisionTreeClassifier()
dt.fit(X, y)
cross_validate(ThereIsNoGodOnlyTheFreezer(dt), X, y)
returns perfect classification as expected.
+1 on merge and vote :) |
|
||
Implementing this SLEP will require: | ||
|
||
1. Factoring out `clone_parametrized` from `clone`, being the portion of its |
There was a problem hiding this comment.
Choose a reason for hiding this comme 8000 nt
The reason will be displayed to describe this comment to others. Learn more.
The way I thought about the refactoring was to have the public clone
be the clone_parametrized
and factor out the recursion as a private _clone
. That way the public interface would still be called clone, but we could remove the safe
from the interface as public one is always safe
and the private one is never safe
.
@jnothman wanna merge and call a vote? |
Merging and calling a vote, assuming @jnothman doesn't have time to push this forward right now (@adrinjalali and @glemaitre gave me 👍) |
I think SLEP017 should be mentioned in |
@lorentzenchr yeah I noticed that that was missing from this PR. I wasn't sure if adding it in the voting PR would be enough, I can send another PR to add it to the index.rst before the vote ends? |
Can these be closed/completed? |
@haiatn thanks for the note. Closed those issues/PRs. |
No description provided.