8000 `parameters_to_validate` to allow for partial validation in validation framework · Issue #25058 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

parameters_to_validate to allow for partial validation in validation framework #25058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomasjpfan opened this issue Nov 28, 2022 · 6 comments · Fixed by #25087
Closed

parameters_to_validate to allow for partial validation in validation framework #25058

thomasjpfan opened this issue Nov 28, 2022 · 6 comments · Fixed by #25087
Labels
Needs Decision - Include Feature Requires decision regarding including feature New Feature

Comments

@thomasjpfan
Copy link
Member

In some PRs for adding validation to functions: #24924 or #25026, there will be double validation, by the function and another by the estimator. @adrinjalali and I share the same concern about keeping constraints in sync between the function and the estimator: #24924 (comment), #25026 (comment).

The test for validation is very strict about including a constraint for every parameter. I see the purpose of having this strictness is such that when a new parameter is added, then a constraint is also included.

I propose adding a parameters_to_validate to validate_params which is a collection of strings that states which parameters should be validated. This way when a new parameter is added, the default is to add a constraint for the new parameter, but it can be ignored by adding it to the parameters_to_validate list.

CC @jeremiedbb @glemaitre @adrinjalali

@github-actions github-actions bot added the Needs Triage Issue requires triage label Nov 28, 2022
@thomasjpfan thomasjpfan added New Feature Needs Decision - Include Feature Requires decision regarding including feature and removed Needs Triage Issue requires triage labels Nov 28, 2022
@adrinjalali
Copy link
Member

instead of parameters_to_validate, we can also add something like param: "ignore", and have that meaning "this method doesn't validate this parameter, but something it calls does"

@jeremiedbb
Copy link
Member

we can also add something like param: "ignore"

We already have that :)
"param": "no_validation"

I think it would be the most appropriate because it really means delegate validation to whatever uses this param. For instance, we use it for the dtype param and delegate validation to numpy. Here it would mean delegate validation to the class.

@thomasjpfan
Copy link
Member Author

We already have that :) "param": "no_validation"

Ah yes. Moving forward, I will start recommending "no_validation" in cases where there is double validation (with a inline comment that states that the validation is delegated).

@jeremiedbb
Copy link
Member

I'm reopening because it's actually not fully satisfying.

If we put "no_validation", when the user gives an invalid param, the error message will be The <param_name> parameter of NMF should be but the user is actually calling non_negative_factorization. I think it brings confusion. Wdyt ?

@jeremiedbb
Copy link
Member
jeremiedbb commented Nov 30, 2022

@glemaitre also found that having long lists of "no_validation" was non very satisfying either and can be error prone because if a contributor put "no_validation" for a parameter that is actually not in the class, then it will never be validated and no test would catch that.

Thus we thought that we could allow to only pass a fraction of the parameters to the function constraint dict and the common test would take the constraint dict of the class and the function, merge them (giving priority to the function for duplicated entries), and then proceed with the checks.

(It does solve the issue I raised in the previous message though)

@jeremiedbb
Copy link
Member
jeremiedbb commented Nov 30, 2022

Side note: my favorite solution would be to deprecate all these functions that are just wrappers around estimators 😄
but the discussions in #14897 did not seem to converge toward a clear consensus :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision - Include Feature Requires decision regarding including feature New Feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0