-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG + 1] Add refit_time_ attribute to model selection #11310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In particular to GridSearchCV and RandomizedSearchCV. Implements scikit-learn#8833.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why WIP? What's the todo list?
sklearn/model_selection/_search.py
Outdated
refit_time_ : float | ||
Seconds used for refitting the best model on the whole dataset. | ||
|
||
This is present only if ``refit`` is set to ``True``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean "if ``refit`` is not False"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, I just fixed the wording.
I wasn't sure if the tests/CI will pass, therefore I decided that this PR isn't ready yet. If that's wrong I'm happy to change the name prior to the tests passing. |
There appears to be an issue with time measurement and Windows. Can I disable the check that the refit time is larger than zero for Windows? |
doc/whats_new/v0.20.rst
Outdated
:class:`model_selection.RandomizedSearchCV` if ``refit`` is set to ``True``. | ||
This will allow measuring the complete time it takes to perform | ||
hyperparameter optimization and refitting the best model on the whole | ||
dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add :user:
thing and PR # (since there's no issue).
assert_true(hasattr(search, "refit_time_")) | ||
assert_true(isinstance(search.refit_time_, float)) | ||
if sys.version_info[0] >= 3: | ||
assert_greater(search.refit_time_, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not >= as for the other parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did this - why didn't I realize this before... Please excuse the extra commits.
Unit tests are passing except for one test on travis-ci which fails because of a server error on mldata.org. |
ugh that's annoying. I restarted that one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Reference Issues/PRs
Resolves #8833.
What does this implement/fix? Explain your changes.
This PR implements a new
refit_time_
attribute which will be stored in GridSearchCV and RandomizedSearchCV ifrefit
is set toTrue
. This will allow measuring the complete time it takes to perform + hyperparameter optimization and refitting the best model on the whole dataset and will be particularly helpful if the hyperparameter optimization is carried out in parallel.