8000 Unclear message regarding param validation · Issue #26897 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Unclear message regarding param validation #26897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adrinjalali opened this issue Jul 25, 2023 · 5 comments
Open

Unclear message regarding param validation #26897

adrinjalali opened this issue Jul 25, 2023 · 5 comments
Labels
Needs Investigation Issue requires investigation Validation related to input validation

Comments

@adrinjalali
Copy link
Member

In the context of #26896 I wrote a test and got a message which I'm really puzzled about. The error message says: ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

This is the code, and the error message:

import numpy as np

from sklearn import set_config
from sklearn.model_selection import cross_validate

from sklearn.tests.test_metaestimators_metadata_routing import (
    ConsumingClassifier,
    ConsumingScorer,
    ConsumingSplitter,
)

set_config(enable_metadata_routing=True)

X = np.ones((10, 2))
y = np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])

scorer = ConsumingScorer().set_score_request(
    sample_weight="score_weights", metadata="score_metadata"
)
splitter = ConsumingSplitter().set_split_request(
    groups="split_groups", metadata="split_metadata"
)
estimator = ConsumingClassifier().set_fit_request(
    sample_weight="fit_sample_weight", metadata="fit_metadata"
)
n_samples = len(X)
rng = np.random.RandomState(0)
score_weights = rng.rand(n_samples)
score_metadata = rng.rand(n_samples)
split_groups = rng.randint(0, 3, n_samples)
split_metadata = rng.rand(n_samples)
fit_sample_weight = rng.rand(n_samples)
fit_metadata = rng.rand(n_samples)

cross_validate(
    estimator,
    X=X,
    y=y,
    scoring=scorer,
    cv=splitter,
    params=dict(
        score_weights=score_weights,
        score_metadata=score_metadata,
        split_groups=split_groups,
        split_metadata=split_metadata,
        fit_sample_weight=fit_sample_weight,
        fit_metadata=fit_metadata,
    ),
)

And the error message:

$ python /tmp/1.py
Traceback (most recent call last):
  File "/tmp/1.py", line 35, in <module>
    cross_validate(
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/_param_validation.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/model_selection/_validation.py", line 406, in cross_validate
    results = parallel(
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/parallel.py", line 65, in __call__
    return super().__call__(iterable_with_config)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 1085, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 597, in __init__
    self.results = batch()
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/parallel.py", line 127, in __call__
    return self.function(*args, **kwargs)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/model_selection/_validation.py", line 838, in _fit_and_score
    fit_params = _check_method_params(X, params=fit_params, indices=train)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/validation.py", line 1983, in _check_method_params
    method_params_validated[param_key] = _safe_indexing(
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/__init__.py", line 341, in _safe_indexing
    indices_dtype = _determine_key_type(indices)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/__init__.py", line 288, in _determine_key_type
    raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

cc @jeremiedbb

@github-actions github-actions bot added the Needs Triage Issue requires triage label Jul 25, 2023
@adrinjalali
Copy link
Member Author

The issue was that the splitter was returning a range(...), and not a list/array/etc. But I'm still very confused by the error message.

@jeremiedbb
Copy link
Member

I don't find it that confusing given that it's kind of an internal error. It's raised by a private function so it's up to us to provide appropriate params and I find that the error describes well the type of allowed column keys. The phrasing could probably be improved a bit though.

What I find confusing is that the docstring says array-like is valid and range fits in the description of an array-like but it looks like _determine_key_type and hence _safe_indexing doesn't deal with range. I think range should be valid indices given that we accept array of ints. What do you think @glemaitre ?

@glemaitre
Copy link
Member
glemaitre commented Jul 26, 2023

What I find confusing is that the docstring says array-like is valid and range fits in the description of an array-like but it looks like _determine_key_type and hence _safe_indexing doesn't deal with range. I think range should be valid indices given that we accept array of ints. What do you think @glemaitre ?

I agree that supporting range would make sense. It has a length and can be consumed multiple times so I think that we could add support for it.

@glemaitre
Copy link
Member

The phrasing could probably be improved a bit though.

The missing info there is the problematic type because we don't know at a first glance what it is.

@adrinjalali
Copy link
Member Author

It was particularly confusing since from the script I pasted here, I had no idea which part is causing the issue. It'd be nice to have a hint of what method / class is causing it, and what the actual given type is.

@thomasjpfan thomasjpfan added Validation related to input validation Needs Investigation Issue requires investigation and removed Needs Triage Issue requires triage labels Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Investigation Issue requires investigation Validation related to input validation
Projects
None yet
Development

No branches or pull requests

4 participants
0