-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Scalar fit_params no longer handled. Was: Singleton array (insert value here) cannot be considered a valid collection. #15805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Don't know why versions didn't come through: Python dependencies: Built with OpenMP: True |
Yes, I suppose we should continue to support at least non-iterable fit
params for those all-too-frequent not-strictly-compliant estimators.
|
I'm trying to work out if you're being sarcastic or not. If you're then thanks. |
Not being sarcastic. It's a bit frustrating that third party libraries
ignore our convention of only having sample-aligned arguments to fit, but I
think it is not great that we suddenly break compatibility.
|
Thanks, i think this is best sorted on the lightGBM git so i'll raise an issue there. For anyone who does come across this i have found a work around by making the below changes to sklearn/model_selection/_search.py -> fit. Currently at line 651 # make sure fit_params are sliceable
- fit_params_values = indexable(*fit_params.values())
- fit_params = dict(zip(fit_params.keys(), fit_params_values))
+# fit_params_values = indexable(*fit_params.values())
+# fit_params = dict(zip(fit_params.keys(), fit_params_values)) |
I'm really not sure if this is a regression on our side though. |
It's been raised 2-3 times in the couple of weeks since 0.22 was released, and not before. In 0.21.X: scikit-learn/sklearn/model_selection/_search.py Lines 630 to 632 in ee328fa
in 0.22.X scikit-learn/sklearn/model_selection/_search.py Lines 650 to 654 in bf24c7e
It does. But we have tacitly supported this behaviour for many, many releases and have changed the behaviour without warning. The support is more than tacit in the sense that scikit-learn/sklearn/model_selection/_validation.py Lines 940 to 944 in bf24c7e
Thus the previous behaviour could be understood as supported and intended behaviour, even though it was untested (with respect to search at least). Yes, we can change behaviour around things that do not conform to our conventions, but the change was introduced by @amueller in #14702 and was incidental to that PR. If we are going to change our handling of popular if non-conforming estimators, it should be done intentionally, and incidental changes should indeed be reverted in patch releases, IMO. |
+1 for at least restoring backward compat in 0.22.1 (to restore support for scalar fit params possibly with deprecation warning). +0 for keeping (undocumented) support for scalar fit params indefinitely without deprecation warning as they feel harmless to me. |
This also affects XGBoost as described in detail here: |
I think we should continue to support scalars, but consider deprecating support for array-likes of size != n_samples in 0.23. I think for 0.22 we need to ensure similar behaviour in search and other cross validation, which mean full backwards compatibility with 0.21 Also note that the documentation is completely opaque regarding the handling. |
The solution should introduce _check_fit_params to check consistent length / indexable while passing through scalars. |
Let's not deprecate non-aligned fit_params just yet. We need to carefully think about it first. Non-aligned fit_params is one proposition to implement the new warm start API #15105 We might also want to add feature-aligned params in the future, who knows |
Fair point
|
Sorry but do we mention this behavior in the doc? This seems strange. |
No this was never mentioned. We said "data-dependent". feature aligned arrays is still an option although we would have to discuss the use case. Another use case is how to pass a custom validation set ( |
@jnothman Could you please explain your comment "It's a bit frustrating that third party libraries
I saw "data-dependent" in our doc, but I think it's difficult to define what's "data-dependent", and I guess "data-dependent" parameters don't need to be indexable. |
I understand that we ordinarily avoid parameters to fit that are not sample-aligned. This is a convention that other estimator developers have not followed. In theory, order data-dependent parameters belong there but we have not done this afaik. |
Description
TypeError: Singleton array array(True) cannot be considered a valid collection.
Steps/Code to Reproduce
Found when running RandomizedSearchCV with LightGBM. Previously worked fine. Latest update requires that all the **fit_params be checked for 'slicability'. Difficult when some fit params are things like early_stopping_rounds = 5.
I've traced the error through and it starts in model_selection/_search.py ln652
-->
Expected Results
Expected to run the LightGBM wthrough RandomSearchGrid
Actual Results
TypeError: Singleton array array(True) cannot be considered a valid collection.
Versions
The text was updated successfully, but these errors were encountered: