8000 Document developer utils for parameter validation · Issue #27038 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Document developer utils for parameter validation #27038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lorentzenchr opened this issue Aug 8, 2023 · 6 comments
Open

Document developer utils for parameter validation #27038

lorentzenchr opened this issue Aug 8, 2023 · 6 comments
Assignees
Labels
Documentation Validation related to input validation

Comments

@lorentzenchr
Copy link
Member

Describe the issue linked to the documentation

#22722 and #24862 introduced parameter validation.

These tools should be documented under https://scikit-learn.org/dev/developers/utilities.html#validation-tools.

In addition, a lot of tests for raising errors in case of bad input seems to disappear. Are there common tests for it or is it still recommended to write tests a la

def test_estim_raises():
    est = Estim(n_rounds="should be a number")
    with pytest.raises(ValueError):
        est.fit(X, y)

Suggest a potential alternative/fix

No response

@lorentzenchr lorentzenchr added Documentation Needs Triage Issue requires triage Validation related to input validation labels Aug 8, 2023
@glemaitre
Copy link
Member

We did not document it for the following reason:

  • we don't require third-party libraries to use the framework
  • the API was still experimental and was, up-to-now, private

In addition, a lot of tests for raising errors in case of bad input seems to disappear. Are there common tests for it or is it still recommended to write tests a la

We have common tests:

@pytest.mark.parametrize(
"estimator",
chain(
_tested_estimators(),
_generate_pipeline(),
_generate_column_transformer_instances(),
_generate_search_cv_instances(),
),
ids=_get_check_estimator_ids,
)
def test_check_param_validation(estimator):
name = estimator.__class__.__name__
_set_checking_parameters(estimator)
check_param_validation(name, estimator)

and the following file for the functions: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tests/test_public_functions.py

@thomasjpfan thomasjpfan removed the Needs Triage Issue requires triage label Aug 11, 2023
@thomasjpfan
Copy link
Member

If the documentation is internal, then we can start with documentation at the top of the file:

We do something similar for the Pairwise distance ArgK min:

# Pairwise Distances Reductions
# =============================
#
# Author: Julien Jerphanion <git@jjerphan.xyz>
#
# Overview

@lorentzenchr
Copy link
Member Author

Yes, I meant in particular documentation for ourselves. If I were to write a new estimator class in scikit-learn, I would currently be lost and start by copy & paste an arbitrary existing class.

@glemaitre
Copy link
Member

OK, I totally see your point. I will address this issue.

@glemaitre
Copy link
Member

/take

@jeremiedbb
Copy link
Member

We could extend the "maintainers" section of the doc to explain things like that and other conventions we need to have in mind when reviewing PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Validation related to input validation
Development

No branches or pull requests

4 participants
0