8000 MAINT Allow partial param validation for functions by jeremiedbb · Pull Request #25087 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT Allow partial param validation for functions #25087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 6, 2022

Conversation

jeremiedbb
Copy link
Member

Fixes #25058

We have public functions that are just wrappers around estimators, e.g k_means, fast_ica, non_negative_factorization, dict_learning_online, ... Adding full param validation for these estimators is not satisfying because it means that we need to keep in sync the dict of constraints in the class and in the function.

Current common tests ensure that we have as many constraints as parameters. To allow partial validation, this PR proposes to split the test for public functions, one for normal functions and one for functions that are wrappers around estimators. For the later, we don't enforce that the constraint dict of the function has as many entries as the number of params. Instead, we join the dict of the class and the dict of the function and make sure that at least one of them raises the appropriate error. This way, the function can delegate validation to the class for some parameters.

The issue with delegating validation to the class is that the error message mentions the name of the estimator instead of the name of the function which can be confusing. To avoid that, we catch the error raised and switch the estimator name by the function name in the error message. To make sure that we can only catch the error from param validation, I think it's better to change the ValueError that we used until now by a custom exception InvalidParameterError.

This PR also implements this for non_negative_factorization to show how it works. It already had the double validation but was not in the list of tested functions.

cc/ @glemaitre @thomasjpfan @adrinjalali

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I am okay with using regex to replace the estimator name with the function name.

@@ -16,6 +17,12 @@
from .validation import _is_arraylike_not_scalar


class InvalidParameterError(Exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to InvalidParameterError may break existing code that is catching ValueError. To be backward compatible, I think this needs to inherit from ValueError:

Suggested change
class InvalidParameterError(Exception):
class InvalidParameterError(ValueError):

With this inheritance, a lot of the tests can remain the same and look for ValueError.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two aspects here:

  • we introduce the parameter validation only in 1.2 so we did not break code of people using the framework
  • we already broke the code of people if we search for the equivalence between the framework and manual checks. In imbalanced-learn we were checking for strings and sometimes we differed from ValueError to TypeError.

I am wondering if inheriting from both ValueError and TypeError would be better then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to inherit from TypeError and ValueError for backward compat. I made this change. Then I think we need to backport this into 1.2.X.

However I prefer to keep looking for InvalidParameterError in our test suite

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 with this change. I don't have a strong opinion regarding the type of error raised.

@jeremiedbb jeremiedbb added this to the 1.2 milestone Dec 5, 2022
@jeremiedbb jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Dec 5, 2022
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment, otherwise this PR LGTM. I agree that the exception type 8000 makes this nice to have for 1.2.X.

As we add more parameter validation to functions, we will run into the reverse situation, where the estimator validates a parameter and calls a function that validates the same parameter. This will also end up with duplicate parameter constraints between the estimator and function. (This point is not a blocker for this PR)

@glemaitre glemaitre merged commit 14130f4 into scikit-learn:main Dec 6, 2022
@glemaitre
Copy link
Member

LGTM. Thanks @jeremiedbb

jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 6, 2022
jjerphan pushed a commit 8EA0 to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:decomposition module:utils No Changelog Needed To backport PR merged in master that need a backport to a release branch defined based on the milestone. Validation related to input validation Waiting for Reviewer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parameters_to_validate to allow for partial validation in validation framework
3 participants
0