[WIP] ENH estimator freezing to stop it being cloned/refit#8374
[WIP] ENH estimator freezing to stop it being cloned/refit#8374jnothman wants to merge 7 commits intoscikit-learn:masterfrom
Conversation
|
(The semantics of freezing a list/array-like still needs clarification. If we want it to work with pipeline steps, it needs to know how to handle lists of tuples including estimators and strings.) |
|
Any opinion on whether |
Codecov Report
@@ Coverage Diff @@
## master #8374 +/- ##
==========================================
+ Coverage 94.75% 94.75% +<.01%
==========================================
Files 342 342
Lines 60801 60847 +46
==========================================
+ Hits 57609 57653 +44
- Misses 3192 3194 +2
Continue to review full report at Codecov.
|
|
I can't say I understand why codecov thinks those lines are untested. |
|
I've tried to work out where this might be documented (wrt semi-supervised/transfer learning? calibration etc? pipeline inspection?) and haven't come up with a best home for it yet. |
|
But perhaps I should write the docs first... |
| if copy: | ||
| estimator = deepcopy(estimator) | ||
| estimator.fit = _FrozenFit(estimator) | ||
| if hasattr(estimator, 'fit_transform'): |
There was a problem hiding this comment.
I would just remove fit_transform and fit_predict as well. Downstream users should be able to duck-type around using these.
There was a problem hiding this comment.
Or did you consider them required API? The transformer and cluster base classes have those, but they are not really part of the API contract imho.
|
Take a pipeline with a frozen transformer: either frozen_fit (or whatever
we call it) needs to handle the fit_transform-while-frozen case, or
pipeline does, since pipeline currently calls fit_transform when available.
I don't know that we need to handle other fit_ prefixes, but if it's
handled by the metaestimator, why not?
|
Yeah, so if you remove them, it'll work fine ;) |
|
I don't get it, but i think I've mostly made up my mind.
…On 27 Jul 2017 1:04 am, "Andreas Mueller" ***@***.***> wrote:
calls fit_transform when available
Yeah, so if you remove them, it'll work fine ;)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#8374 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68ea8NiVDKnxyTqlm48Es2Da2ND1ks5sR1V0gaJpZM4MDGar>
.
|
|
My proposal is to remove |
|
Though I thought it was easier to remove a method from an object. It doesn't seem to be possible without hacking getattr so it's probably not worth it :-/ |
|
You made up your mind in which direction? Not catering to old meta-estimators? |
|
Oh, didn't catch up with the other PR, you think this or similar is the right solution, got it. |
| assert_array_equal(est.scores_, frozen_est2.scores_) | ||
|
|
||
| # scores should be unaffected by new fit | ||
| assert_true(frozen_est2.fit() is frozen_est2) |
There was a problem hiding this comment.
This is always true, right? Well, I guess you're testing that fit can be called without arguments?
| estimator = deepcopy(estimator) | ||
| estimator.fit = _FrozenFit(estimator) | ||
| if hasattr(estimator, 'fit_transform'): | ||
| estimator.fit_transform = functools.partial(_frozen_fit_method, |
There was a problem hiding this comment.
This is because estimator.transform might not exist, and we want to provide an attribute error when fit_transform is called, and not when freeze is called, right?
Maybe write a comment about that or maybe rename it to make that more clear?
| return getattr(estimator, "_estimator_type", None) == "regressor" | ||
|
|
||
|
|
||
| class _FrozenFit(object): |
There was a problem hiding this comment.
I feel the boolean flag is somewhat either to understand / check.
Fixes #8370
A whole lot less magic than #8372, but still requires estimator to have a
__dict__.