-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX param_distribution
param of HalvingRandomSearchCV
accepts list of dicts
#26893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX param_distribution
param of HalvingRandomSearchCV
accepts list of dicts
#26893
Conversation
param_distribution
param of HalvingRandomSearchCV
accepts lists of dicts param_distribution
param of HalvingRandomSearchCV
accepts list of dicts
I think the main reasons of the failures come from a too small dataset and not enough candidates. Here's a slightly modified version: def test_halving_random_search_cv_results():
X, y = make_classification(n_samples=150, n_features=4, random_state=42)
params = [
{"kernel": ["rbf"], "C": expon(scale=10), "gamma": expon(scale=0.1)},
{"kernel": ["poly"], "degree": [2, 3]},
]
param_keys = ("param_C", "param_degree", "param_gamma", "param_kernel")
score_keys = (
"mean_test_score",
"mean_train_score",
"rank_test_score",
"split0_test_score",
"split1_test_score",
"split2_test_score",
"split0_train_score",
"split1_train_score",
"split2_train_score",
"std_test_score",
"std_train_score",
"mean_fit_time",
"std_fit_time",
"mean_score_time",
"std_score_time",
)
extra_keys = ("n_resources", "iter")
search = HalvingRandomSearchCV(
SVC(),
cv=3,
param_distributions=params,
return_train_score=True,
random_state=0,
)
search.fit(X, y)
n_candidates = sum(search.n_candidates_)
cv_results = search.cv_results_
# Check results structure
check_cv_results_keys(
cv_results, param_keys, score_keys, n_candidates, extra_keys
)
check_cv_results_array_types(search, param_keys, score_keys)
assert all(
(
cv_results["param_C"].mask[i]
and cv_results["param_gamma"].mask[i]
and not cv_results["param_degree"].mask[i]
)
for i in range(n_candidates)
if cv_results["param_kernel"][i] == "poly"
)
assert all(
(
not cv_results["param_C"].mask[i]
and not cv_results["param_gamma"].mask[i]
and cv_results["param_degree"].mask[i]
)
for i
8000
in range(n_candidates)
if cv_results["param_kernel"][i] == "rbf"
)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add a changelog entry for 1.3.1 in v1.3.rst
Thanks for reviewing, @jeremiedbb, for your help and the explanations. I have made the changes according to your suggestions and kind of understood your reasoning. I still haven't understood the asserts for the masked arrays though (all the candidates appeared as |
After reviewing this together with @adrinjalali, I have also modified the two not fully functioning tests I had idicated in the issue (#26885). Please have a look. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my side.
And you would need to solve the conflict in the changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! LGTM (looks good to me)
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
I've finished the last few things. Thanks everyone for your support. :) |
…t of dicts (scikit-learn#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…t of dicts (scikit-learn#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…t of dicts (#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…t of dicts (scikit-learn#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
What does this implement/fix? Explain your changes.
Closes #26885
Fixed that the
param_distribution
param ofHalvingRandomSearchCV
accepts lists of dicts and updated documentation.I also tried to implement a test, using
test_random_search_cv_results
as a template, as you suggested @glemaitre , but I encountered several problems, that I could not resolve.The template implementation calls two functions (
check_cv_results_array_types
andcheck_cv_results_keys
), that check and compare the occurrence of params. But those might not always be present (like 'param_degree' is only a key incv_results
for the poly kernel, not for rbf). (HalvingSerach' cv_results will also have two additional keys, compared to the other searches, these tests are used for: "iter", "n_resources")I cannot see a way to use the assert tests in the end of the template test, because HalvingGridSearchCV will mask part of the candidates, as part of the process. So, checking for this is not going to work, I assume.
I have determined the value for
n_proportion = 6
by looking atcv_results[key].shape,
which is for sure the wrong way around.I have commented the test out and hope for your advice. At the moment the test fails because of KeyError (
param_degree
) in one of the util functions.I could write a much simpler test, that captures the insertion of
param_distribution
as a list of dicts.