8000 [WIP] ENH estimator freezing to stop it being cloned/refit by jnothman · Pull Request #8374 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] ENH estimator freezing to stop it being cloned/refit #8374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jnothman
Copy link
Member

Fixes #8370

A whole lot less magic than #8372, but still requires estimator to have a __dict__.

@jnothman jnothman changed the title [WIP] ENH add freeze method which stops an estimator being cloned/refit [WIP] ENH estimator freezing to stop it being cloned/refit Feb 16, 2017
@jnothman
Copy link
Member Author

(The semantics of freezing a list/array-like still needs clarification. If we want it to work with pipeline steps, it needs to know how to handle lists of tuples including estimators and strings.)

@jnothman
Copy link
Member Author
jnothman commented Feb 16, 2017

Any opinion on whether freeze belongs in base or in some new model (sklearn.reuse for instnace)?

@codecov
Copy link
codecov bot commented Feb 17, 2017

Codecov Report

Merging #8374 into master will increase coverage by <.01%.
The diff coverage is 95.91%.

@@            Coverage Diff             @@
##           master    #8374      +/-   ##
==========================================
+ Coverage   94.75%   94.75%   +<.01%     
==========================================
  Files         342      342              
  Lines       60801    60847      +46     
==========================================
+ Hits        57609    57653      +44     
- Misses       3192     3194       +2
Impacted Files Coverage Δ
sklearn/tests/test_base.py 97.2% <100%> (+0.4%)
sklearn/base.py 93.68% <90.47%> (-0.47%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8beaf32...705416b. Read the comment docs.

@jnothman
Copy link
Member Author

I can't say I understand why codecov thinks those lines are untested.

@jnothman
Copy link
Member Author

I've tried to work out where this might be documented (wrt semi-supervised/transfer learning? calibration etc? pipeline inspection?) and haven't come up with a best home for it yet.

@jnothman
Copy link
Member Author

But perhaps I should write the docs first...

if copy:
estimator = deepcopy(estimator)
estimator.fit = _FrozenFit(estimator)
if hasattr(estimator, 'fit_transform'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove fit_transform and fit_predict as well. Downstream users should be able to duck-type around using these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or did you consider them required API? The transformer and cluster base classes have those, but they are not really part of the API contract imho.

@jnothman
Copy link
Member Author
jnothman commented Jul 26, 2017 via email

@amueller
Copy link
Member

calls fit_transform when available

Yeah, so if you remove them, it'll work fine ;)

@jnothman
Copy link
Member Author
jnothman commented Jul 26, 2017 via email

@amueller
Copy link
Member

My proposal is to remove fit_transform and fit_predict from estimators if they are frozen.

@amueller
Copy link
Member

Though I thought it was easier to remove a method from an object. It doesn't seem to be possible without hacking getattr so it's probably not worth it :-/

@amueller
Copy link
Member

You made up your mind in which direction? Not catering to old meta-estimators?

@amueller
Copy link
Member

Oh, didn't catch up with the other PR, you think this or similar is the right solution, got it.

assert_array_equal(est.scores_, frozen_est2.scores_)

# scores should be unaffected by new fit
assert_true(frozen_est2.fit() is frozen_est2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always true, right? Well, I guess you're testing that fit can be called without arguments?

estimator = deepcopy(estimator)
estimator.fit = _FrozenFit(estimator)
if hasattr(estimator, 'fit_transform'):
estimator.fit_transform = functools.partial(_frozen_fit_method,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because estimator.transform might not exist, and we want to provide an attribute error when fit_transform is called, and not when freeze is called, right?
Maybe write a comment about that or maybe rename it to make that more clear?

sklearn/base.py Outdated
@@ -523,3 +526,51 @@ def is_classifier(estimator):
def is_regressor(estimator):
"""Returns True if the given estimator is (probably) a regressor."""
return getattr(estimator, "_estimator_type", None) == "regressor"


class _FrozenFit(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the boolean flag is somewhat either to understand / check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0