-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MRG Random sampling hyper parameters #1194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG Random sampling hyper parameters #1194
Conversation
This draft does not support tree-like parameter spaces btw. |
About tree-structured spaces I think it's ok to do without them because you're doing random search, so having extra parameters which are ignored doesn't really cost you anything in statistical efficiency. In general I'm +1 for merging this code, as I think it's useful and we should steer people towards doing more random search and less explicit grid search (specially as adding more hyperparameters can often improve performance at the expense of more tuning). |
params[k] = v[rnd.randint(len(v))] | ||
yield params | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point -- code using yield cannot be serialized, but it is often helpful to serialize long-running code such as hyper-parameter optimization.
Consider making __iter__
return self and adding a next
method that returns a params sample?
I'm happy enough to do this as a PR of my own later if you like, so that any discussion can be separate from this PR. Anyway there may be other serialization problems, this might be necessary but insufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change it as the draws are independent any way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to do this right.
It should only sample n_iter
times.
But I want to be able to iterate over it more than once.
I guess that means I have to distinguish between container and iterator.
Not sure how to do this without creating another class. Any help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller AFAIK you're right, you can't get independent iterators like yield gives you without another class. If this is, in your judgement, a pain, then leave it for later / never.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating custom iterators is easy enough (I just did it), but pickling them is a pain because scipy.stats distributions can't be pickled.
What needs to be done to get this merged? |
At least example, documentation and testing. Sorry this stalled. Not sure if I have time for it on the weekend, I have some other PRs. There are some changes I made in GridSearchCV in this PR that needs to be reverted. Generally, it needs to be rebased but that should be easy. Not sure if everyone is happy with the API, none of the core-devs voiced an opinion afaik. |
ok, rebased. Code should be fine now. |
Cool. If I push to your repo, will the changes appear here?
|
they are here I think. |
I did some refactoring and added an example. Apart from the docs I think this should be good to go. Will try to do the user guide tomorrow. |
Added docs, renamed PR to MRG. |
Any reviews please? |
I'll add that one to my todo list. Unfortunately don't have time today. |
Thanks :) don't worry, you did so much reviewing for me yesterday already! I just didn't want it to get lost ;) |
+1 LGTM |
May I lazily ask a question, without having read the code or the paper? This just takes random samples of the hyperparameters and remembers the best sample, right? The reason I'm asking is because I wonder what will happen to |
for each sampled set of parameters, cross validation is run. that hopefully makes the score robust to the seed. does that answer your question? Lars Buitinck notifications@github.com schrieb:
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. |
Rebased and stuff, should be mergable again. |
self.estimator_scores_ = [ | ||
(clf_params, score, all_scores) | ||
for clf_params, (score, _), all_scores | ||
in zip(parameter_iterator, scores, cv_scores)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. Thanks :)
In |
@arjoly thanks for your comments, I'll address them shortly :) |
Ok, should be good now. If there are no more comments, I'll squash and merge. |
@@ -72,11 +75,141 @@ def __iter__(self): | |||
yield params | |||
|
|||
|
|||
class IterGrid(ParameterGrid): | |||
"""Generators on the combination of the various parameter lists given. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a note on the deprecation either here or at the end of the docstring.
Code coverage is good:
|
In the example, rather than just reporting the top parameters, it would be better to report the, say, top 3 models along with standard of the mean validation score:
that outputs:
We could even add a |
>>> list(ParameterSampler(param_grid, n_iter=4)) | ||
... #doctest: +NORMALIZE_WHITESPACE +ELLIPSIS | ||
[{'a': ..., 'b': ...}, {'a': ..., 'b': ...}, | ||
{'a': ..., 'b': ...}, {'a': ..., 'b': ...}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be possible to fix the seed of the sampler and display the values of a
and the first 3 digits of b
before ellipsis in the output?
The doctest failure on travis is real:
missing an ellipsis. |
Not sure why that changed... but ellipsis seem like a good idea any way. |
Oh, my last commit fixes an unrelated bug in master. I should probably add a test. |
So I guess I shouldn't use ellipsis because this caught a bug ^^ |
should I put the bugfix + test in a separate PR? |
I would rather not add a function for reporting just now. I started something at some point but it was not generic enough. Once this is in, I'd like to revisit my ideas and create helper functions or an interface for GridSearchCV and RandomizedSearchCV that is more powerful and easier to use. I'll take your code for the example, though ;) |
That would be great. We can get a quick merge for the bugfix.
Alright. Let's add the reporting function only in the example for now. It's likely to change if we add train time + train score anyway. |
there was no test for the |
See #1731 for the bugfix / regression test |
Ok, I think I addressed your comments. |
Looks good to me :) +1 for merging. |
Ok, I'll squash and merge. |
@larsmans is there a good way to squash commits if I merged master in-between? I could drop all the history for a clean commit, not sure that is a great idea, though. |
Ok, I merged without history. Everything else would be way to messy and / or to much work. Thanks for the reviews, guys :) |
You definitely deserve a 🍺 for this one :) |
Or maybe a caol ila :) |
Not sure I deserve one, will take it any way ;) |
Cool !!! Congratz @amueller !!! |
This is another shot at implementing randomized hyper parameter search in sklearn.
In contrast to #455, I use scipy.stats distributions for each parameter.
If I understand James' Paper correctly, one of the important points is not to use a grid.
Imagine having one totally irrelevant parameter. Using a grid basically makes a large part of the samples redundant, while using a continuous distribution is not affected by the presence of the irrelevant parameter at all.
For discrete parameters, the user can also specify a list of names i.e.
kernel=['rbf', 'linear', 'poly']
, which is sampled from using a uniform distribution.I don't like passing objects as parameters, but this seems to be the best way to specify a distribution.
A user can put in any object that supports
.rvs()
and scipy.stats.distributions already has a solid amount of predefined ones.