Unclear message regarding param validation #26897

adrinjalali · 2023-07-25T19:20:12Z

In the context of #26896 I wrote a test and got a message which I'm really puzzled about. The error message says: ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

This is the code, and the error message:

import numpy as np

from sklearn import set_config
from sklearn.model_selection import cross_validate

from sklearn.tests.test_metaestimators_metadata_routing import (
    ConsumingClassifier,
    ConsumingScorer,
    ConsumingSplitter,
)

set_config(enable_metadata_routing=True)

X = np.ones((10, 2))
y = np.array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4])

scorer = ConsumingScorer().set_score_request(
    sample_weight="score_weights", metadata="score_metadata"
)
splitter = ConsumingSplitter().set_split_request(
    groups="split_groups", metadata="split_metadata"
)
estimator = ConsumingClassifier().set_fit_request(
    sample_weight="fit_sample_weight", metadata="fit_metadata"
)
n_samples = len(X)
rng = np.random.RandomState(0)
score_weights = rng.rand(n_samples)
score_metadata = rng.rand(n_samples)
split_groups = rng.randint(0, 3, n_samples)
split_metadata = rng.rand(n_samples)
fit_sample_weight = rng.rand(n_samples)
fit_metadata = rng.rand(n_samples)

cross_validate(
    estimator,
    X=X,
    y=y,
    scoring=scorer,
    cv=splitter,
    params=dict(
        score_weights=score_weights,
        score_metadata=score_metadata,
        split_groups=split_groups,
        split_metadata=split_metadata,
        fit_sample_weight=fit_sample_weight,
        fit_metadata=fit_metadata,
    ),
)

And the error message:

$ python /tmp/1.py
Traceback (most recent call last):
  File "/tmp/1.py", line 35, in <module>
    cross_validate(
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/_param_validation.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/model_selection/_validation.py", line 406, in cross_validate
    results = parallel(
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/parallel.py", line 65, in __call__
    return super().__call__(iterable_with_config)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 1085, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 901, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 819, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 597, in __init__
    self.results = batch()
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
  File "/home/adrin/miniforge3/envs/sklearn/lib/python3.10/site-packages/joblib/parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/parallel.py", line 127, in __call__
    return self.function(*args, **kwargs)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/model_selection/_validation.py", line 838, in _fit_and_score
    fit_params = _check_method_params(X, params=fit_params, indices=train)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/validation.py", line 1983, in _check_method_params
    method_params_validated[param_key] = _safe_indexing(
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/__init__.py", line 341, in _safe_indexing
    indices_dtype = _determine_key_type(indices)
  File "/home/adrin/Projects/sklearn/scikit-learn/sklearn/utils/__init__.py", line 288, in _determine_key_type
    raise ValueError(err_msg)
ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers or all strings, or boolean mask is allowed

cc @jeremiedbb

The text was updated successfully, but these errors were encountered:

adrinjalali · 2023-07-25T20:05:47Z

The issue was that the splitter was returning a range(...), and not a list/array/etc. But I'm still very confused by the error message.

jeremiedbb · 2023-07-26T13:45:37Z

I don't find it that confusing given that it's kind of an internal error. It's raised by a private function so it's up to us to provide appropriate params and I find that the error describes well the type of allowed column keys. The phrasing could probably be improved a bit though.

What I find confusing is that the docstring says array-like is valid and range fits in the description of an array-like but it looks like _determine_key_type and hence _safe_indexing doesn't deal with range. I think range should be valid indices given that we accept array of ints. What do you think @glemaitre ?

glemaitre · 2023-07-26T14:09:14Z

What I find confusing is that the docstring says array-like is valid and range fits in the description of an array-like but it looks like _determine_key_type and hence _safe_indexing doesn't deal with range. I think range should be valid indices given that we accept array of ints. What do you think @glemaitre ?

I agree that supporting range would make sense. It has a length and can be consumed multiple times so I think that we could add support for it.

glemaitre · 2023-07-26T14:11:31Z

The phrasing could probably be improved a bit though.

The missing info there is the problematic type because we don't know at a first glance what it is.

adrinjalali · 2023-07-26T18:25:04Z

It was particularly confusing since from the script I pasted here, I had no idea which part is causing the issue. It'd be nice to have a hint of what method / class is causing it, and what the actual given type is.

github-actions bot added the Needs Triage Issue requires triage label Jul 25, 2023

thomasjpfan added Validation related to input validation Needs Investigation Issue requires investigation and removed Needs Triage Issue requires triage labels Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear message regarding param validation #26897

Unclear message regarding param validation #26897

Unclear message regarding param validation #26897

Unclear message regarding param validation #26897

Comments