8000 ValueError: `max_samples` must be in range (0, 1) but got value 1.0 for random forest · Issue #1086 · scikit-optimize/scikit-optimize · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

ValueError: max_samples must be in range (0, 1) but got value 1.0 for random forest #1086

Closed
RNarayan73 opened this issue Oct 18, 2021 · 2 comments

Comments

@RNarayan73
Copy link
RNarayan73 commented Oct 18, 2021

Hello,

I get an error when running BayesSearchCV with the RandomForestClassifier in Scikit Learn:

ValueError: max_samples must be in range (0, 1) but got value 1.0
It seems to be related to #1065 and #1067. I have installed the mock0.9rc1 branch which has fixes for these, but my issue persists.

The HP max_samples for random forest is described as below:

max_samples: int or float, default=None
If bootstrap is True, the number of samples to draw from X to train each base estimator.
If None (default), then draw X.shape[0] samples.
If int, then draw max_samples samples.
If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1).
New in version 0.22.

It also occurs for the ExtraTreesClassifier which has an identical signature as random forest.

But:

  • This issue doesn't occur for another HP max_features which also has a similar 'int' / 'float' definition.
  • It also doesn't occur for max_samples when using an alternative SearchCV such as RandomizedSearchCV, TuneSearchCV, OptunaSearchCV etc.

Here's a code snippet to replicate the error:

space = {
 'max_features': Real(low=0.1, high=1.0, prior='uniform'),
 'max_samples': Real(low=0.1, high=1.0, prior='uniform')
}

bay_pipe_test = BayesSearchCV(RandomForestClassifier(random_state=0, n_jobs=1), 
                              search_spaces=space, 
                              random_state=0, 
                              n_jobs=4, verbose=1)

bay_pipe_test.fit(X_train, y_train)

Has anyone encountered this? Any suggestions to resolve?

Narayan

@QuentinSoubeyran
Copy link
Contributor

This is not a scikit-optimize issue, you are not using the predictors correctly: the notation (0, 1) denotes an open interval, meaning the bounds are invalid. 1.0 is not a valid value for max_samples and scikit-learn is telling you so.

There is no need to use any SearchCV to trigger that, just use a classifier with the invalid value:

from sklearn import datasets, ensemble

X, y = datasets.load_iris(return_X_y=True)
clf = ensemble.RandomForestClassifier(max_samples=1.0)
clf.fit(X, y)

This is not triggered by other SearchCV simply because they happen not to try the value 1.0. A simple solution to get your code to work properly is to use valid bounds, such as Real(1e-8, 1-1e-8).

Note: you can enclose multiline code in triple backtick to have it properly formatted, see https://guides.github.com/features/mastering-markdown/

@RNarayan73
Copy link
Author

Thanks @QuentinSoubeyran for your advice.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0