-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] ENH estimator freezing to stop it being cloned/refit #8374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(The semantics of freezing a list/array-like still needs clarification. If we want it to work with pipeline steps, it needs to know how to handle lists of tuples including estimators and strings.) |
Any opinion on whether |
Codecov Report
@@ Coverage Diff @@
## master #8374 +/- ##
==========================================
+ Coverage 94.75% 94.75% +<.01%
==========================================
Files 342 342
Lines 60801 60847 +46
==========================================
+ Hits 57609 57653 +44
- Misses 3192 3194 +2
Continue to review full report at Codecov.
|
I can't say I understand why codecov thinks those lines are untested. |
I've tried to work out where this might be documented (wrt semi-supervised/transfer learning? calibration etc? pipeline inspection?) and haven't come up with a best home for it yet. |
But perhaps I should write the docs first... |
if copy: | ||
estimator = deepcopy(estimator) | ||
estimator.fit = _FrozenFit(estimator) | ||
if hasattr(estimator, 'fit_transform'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just remove fit_transform
and fit_predict
as well. Downstream users should be able to duck-type around using these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or did you consider them required API? The transformer and cluster base classes have those, but they are not really part of the API contract imho.
Take a pipeline with a frozen transformer: either frozen_fit (or whatever
we call it) needs to handle the fit_transform-while-frozen case, or
pipeline does, since pipeline currently calls fit_transform when available.
I don't know that we need to handle other fit_ prefixes, but if it's
handled by the metaestimator, why not?
|
Yeah, so if you remove them, it'll work fine ;) |
I don't get it, but i think I've mostly made up my mind.
…On 27 Jul 2017 1:04 am, "Andreas Mueller" ***@***.***> wrote:
calls fit_transform when available
Yeah, so if you remove them, it'll work fine ;)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#8374 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68ea8NiVDKnxyTqlm48Es2Da2ND1ks5sR1V0gaJpZM4MDGar>
.
|
My proposal is to remove |
Though I thought it was easier to remove a method from an object. It doesn't seem to be possible without hacking getattr so it's probably not worth it :-/ |
You made up your mind in which direction? Not catering to old meta-estimators? |
Oh, didn't catch up with the other PR, you think this or similar is the right solution, got it. |
assert_array_equal(est.scores_, frozen_est2.scores_) | ||
|
||
# scores should be unaffected by new fit | ||
assert_true(frozen_est2.fit() is frozen_est2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always true, right? Well, I guess you're testing that fit can be called without arguments?
estimator = deepcopy(estimator) | ||
estimator.fit = _FrozenFit(estimator) | ||
if hasattr(estimator, 'fit_transform'): | ||
estimator.fit_transform = functools.partial(_frozen_fit_method, |
There was a problem hid 8000 ing this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because estimator.transform
might not exist, and we want to provide an attribute error when fit_transform
is called, and not when freeze
is called, right?
Maybe write a comment about that or maybe rename it to make that more clear?
sklearn/base.py
Outdated
@@ -523,3 +526,51 @@ def is_classifier(estimator): | |||
def is_regressor(estimator): | |||
"""Returns True if the given estimator is (probably) a regressor.""" | |||
return getattr(estimator, "_estimator_type", None) == "regressor" | |||
|
|||
|
|||
class _FrozenFit(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the boolean flag is somewhat either to understand / check.
Fixes #8370
A whole lot less magic than #8372, but still requires estimator to have a
__dict__
.