8000 SLEP017: clone override by jnothman · Pull Request #67 · scikit-learn/enhancement_proposals · GitHub
[go: up one dir, main page]

Skip to content

SLEP017: clone override #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 31, 2022
Merged

SLEP017: clone override #67

merged 4 commits into from
Oct 31, 2022

Conversation

jnothman
Copy link
Member

No description provided.

@adrinjalali adrinjalali changed the title Draft of SLEP07: clone override Draft of SLEP017: clone override Mar 22, 2022
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at other libraries, I think the __sklearn_is_fitted__ is has been well received since the semantics of that is simple.

I think the current design of clone has put us in a "design corner" where everything we want to clone must go into __init__. This __sklearn_clone__ proposal is the most flexibility in terms of what it can do.

A much less flexible alternative is a __sklearn_clone_parameters__, which returns a list of additional attributes for clone to copy. This would enable the metadata requests and the parameter space use case.

Uh oh!

There was an error while loading. Please reload this page.

@jnothman jnothman changed the title Draft of SLEP017: clone override SLEP017: clone override Sep 23, 2022
Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for this to be merged and move to a final discussion/vote.

Comment on lines +104 to +107
Alternative solutions include continuing to force developers into providing
sometimes-awkward constructor parameters for any clonable material, and
Scikit-learn core developers having the exceptional ability to extend
the ``clone`` function as needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the sarcasm lol

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to merge this as a draft.

In the meantime, I opened scikit-learn/scikit-learn#24568 as an implementation for this SLEP.

@adrinjalali
Copy link
Member

@jnothman wanna merge and call for a vote?

@haiatn
Copy link
haiatn commented Oct 5, 2022

I am not sure but because of linked issues it may close one or more of the following:
scikit-learn/scikit-learn#5563
scikit-learn/scikit-learn#9696
scikit-learn/scikit-learn#16137
scikit-learn/scikit-learn#16185

- :issue:`8370`, :issue:`9464`: generic estimator wrapper for model freezing
- :issue:`5082`: configuring parameter search spaces
- :issue:`16079`: configuring the routing of sample-aligned metadata
- :issue:`16185`: configuring selected parameters to not be deep-copied
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we can deprecate pipeline finally because we have a replacement.

A centralised implementation of :func:`sklearn.base.clone` regards
an estimator's constructor parameters as the state that should be copied.
This proposal allows for an estimator class to implment custom cloning
functionality with a ``__sklearn_clone__`` method, which will default to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never know if it's "a sklearn" or "an sklearn", I tend to favor "an sklearn" but you're the native speaker :)

setting and construction in Scikit-learn.
In this vein, objections to :issue:`5080` cited the notion that "``clone``
has a simple contract," and that "extension to it would open the door to
violations of that contract" [2]_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might add to that, that currently the content of any dictionary is deepcopied, so you can work around any convention by adding a dictionary to the constructor and then populating that dictionary in fit.

i.e.

from sklearn.base import BaseEstimator, ClassifierMixin, clone

class ThereIsNoGodOnlyTheFreezer(ClassifierMixin, BaseEstimator):
    def __init__(self, estimator_to_freeze, dictionary_to_deepcopy=None):
        self.estimator_to_freeze = estimator_to_freeze
        self.dictionary_to_deepcopy = dictionary_to_deepcopy or {'frozen_estimator': estimator_to_freeze}
    def fit(self, X, y):
        return self
    def predict(self, X):
        return self.dictionary_to_deepcopy['frozen_estimator'].predict(X)

Then

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate

X, y = load_iris(return_X_y=True)
dt = DecisionTreeClassifier()
dt.fit(X, y)
cross_validate(ThereIsNoGodOnlyTheFreezer(dt), X, y)

returns perfect classification as expected.

@amueller
Copy link
Member

+1 on merge and vote :)


Implementing this SLEP will require:

1. Factoring out `clone_parametrized` from `clone`, being the portion of its
Copy link
Member

Choose a reason for hiding this comme 8000 nt

The reason will be displayed to describe this comment to others. Learn more.

The way I thought about the refactoring was to have the public clone be the clone_parametrized and factor out the recursion as a private _clone. That way the public interface would still be called clone, but we could remove the safe from the interface as public one is always safe and the private one is never safe.

@amueller
Copy link
Member

@jnothman wanna merge and call a vote?

@amueller
Copy link
Member

Merging and calling a vote, assuming @jnothman doesn't have time to push this forward right now (@adrinjalali and @glemaitre gave me 👍)

@amueller amueller merged commit 3a36b1d into scikit-learn:master Oct 31, 2022
@lorentzenchr
Copy link
Member
77F4 lorentzenchr commented Nov 1, 2022

I think SLEP017 should be mentioned in index.rst in to be able to find in in the homepage navigation.

@amueller
Copy link
Member
amueller commented Nov 1, 2022

@lorentzenchr yeah I noticed that that was missing from this PR. I wasn't sure if adding it in the voting PR would be enough, I can send another PR to add it to the index.rst before the vote ends?

@amueller amueller mentioned this pull request Nov 1, 2022
@haiatn
Copy link
haiatn commented Aug 26, 2023

I am not sure but because of linked issues it may close one or more of the following: scikit-learn/scikit-learn#5563 scikit-learn/scikit-learn#9696 scikit-learn/scikit-learn#16137 scikit-learn/scikit-learn#16185

Can these be closed/completed?

@adrinjalali
Copy link
Member

@haiatn thanks for the note. Closed those issues/PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0