8000 Make it possible to pickle a fit `GridSearchCV` and `RandomizedSearchCV` by jnothman · Pull Request #1801 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Make it possible to pickle a fit GridSearchCV and RandomizedSearchCV #1801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jnothman
Copy link
Member

Fix for #1789:

  • don't make best_estimator_'s methods available by copying its methods. Instead link to its attributes by properties (also, added properties for decision_function and transform where predict, predict_proba and score had been available).
  • and don't declare CVScoresTuple in a closure.

…er search

Avoids copying an unpicklable method to parameter search's __dict__.

Also adds decision_function and transform where only predict and predict_proba were available before.
Define CVScoreTuple in the module namespace.
@amueller
Copy link
Member

Great, I think this is the right solution. Not sure we need the underscore in front of the CVScoresTuple, though. Could you please make the test not only smoke tests? For example unpickle and see if the results of predict are the same.
Also you can assert that the results of transform and decision_function are the same as the ones of best_estimator_. Which estimator did you test transform with? That only works with LDA ,right?

One downside of your solution is that it is no longer possible to ducktype the presence of decision_function (for example). Not sure this would be possible, though. If an base estimator doesn't provide a given function, an AttributeError is rased, correct?

@jnothman
Copy link
Member Author

I'll add some non-smoke testing of the pickle soon.

I'm only smoke-testing transform, etc. with test_grid_search.MockClassifier. And estimators with L1 regularisation also implement transform.

To answer your second point (I hadn't been sure of the answer myself):

class O(object):
  @property
  def a(self):
    return self.b
>>> o = O()
>>> hasattr(o, 'a')
False
>>> o.b = 5
>>> hasattr(o, 'a')
True

I.e. if AttributeError is raised in property.__get__, hasattr will return False.

This too should be tested somehow, but I'm not sure if it's for this patch, or generally an issue for meta-classifiers and functions only available after fit().

This suggests that anything relying on best_estimator_ is best implemented as a property (including score)!

@jnothman
Copy link
Member Author

Actually it looks like GridSearchCV.score is broken: it uses a deprecated definition of scorer, and only when the classifier doesn't have a score function. Is that correct, or useful for any particular purpose?

And rethinking, I'm not really sure how/why to test pickling: isn't it enough to assume that if pickle works and is tested, as long as no error occurs it's fine?

@larsmans
Copy link
Member

LGTM, 👍 for merge.

(@GaelVaroquaux might like to give his opinion too, since this involves properties...)

@GaelVaroquaux
Copy link
Member

(@GaelVaroquaux might like to give his opinion too, since this involves
properties...)

Valid usecase for properties: adaption pattern, which is what they are
for IMHO.

@larsmans
Copy link
Member

@amueller Merge?

@amueller
Copy link
Member

Sorry, have been offline for week as you might have noticed. LGTM.

@jnothman If the runtime of a real test is basically the same as the smoke test, I don't see why we shouldn't do it. No idea what could break if you don't test for it ;)

@amueller
Copy link
Member
amueller commented Apr 2, 2013

Merged by rebase. Thanks :)

@amueller amueller closed this Apr 2, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0