ValueError: `max_samples` must be in range (0, 1) but got value 1.0 for random forest #1086

RNarayan73 · 2021-10-18T15:09:14Z

Hello,

I get an error when running BayesSearchCV with the RandomForestClassifier in Scikit Learn:

ValueError: max_samples must be in range (0, 1) but got value 1.0
It seems to be related to #1065 and #1067. I have installed the mock0.9rc1 branch which has fixes for these, but my issue persists.

The HP max_samples for random forest is described as below:

max_samples: int or float, default=None
If bootstrap is True, the number of samples to draw from X to train each base estimator.
If None (default), then draw X.shape[0] samples.
If int, then draw max_samples samples.
If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1).
New in version 0.22.

It also occurs for the ExtraTreesClassifier which has an identical signature as random forest.

But:

This issue doesn't occur for another HP max_features which also has a similar 'int' / 'float' definition.
It also doesn't occur for max_samples when using an alternative SearchCV such as RandomizedSearchCV, TuneSearchCV, OptunaSearchCV etc.

Here's a code snippet to replicate the error:

space = {
 'max_features': Real(low=0.1, high=1.0, prior='uniform'),
 'max_samples': Real(low=0.1, high=1.0, prior='uniform')
}

bay_pipe_test = BayesSearchCV(RandomForestClassifier(random_state=0, n_jobs=1), 
                              search_spaces=space, 
                              random_state=0, 
                              n_jobs=4, verbose=1)

bay_pipe_test.fit(X_train, y_train)

Has anyone encountered this? Any suggestions to resolve?

Narayan

The text was updated successfully, but these errors were encountered:

QuentinSoubeyran · 2021-10-19T10:02:50Z

This is not a scikit-optimize issue, you are not using the predictors correctly: the notation (0, 1) denotes an open interval, meaning the bounds are invalid. 1.0 is not a valid value for max_samples and scikit-learn is telling you so.

There is no need to use any SearchCV to trigger that, just use a classifier with the invalid value:

from sklearn import datasets, ensemble

X, y = datasets.load_iris(return_X_y=True)
clf = ensemble.RandomForestClassifier(max_samples=1.0)
clf.fit(X, y)

This is not triggered by other SearchCV simply because they happen not to try the value 1.0. A simple solution to get your code to work properly is to use valid bounds, such as Real(1e-8, 1-1e-8).

Note: you can enclose multiline code in triple backtick to have it properly formatted, see https://guides.github.com/features/mastering-markdown/

RNarayan73 · 2021-10-19T11:57:56Z

Thanks @QuentinSoubeyran for your advice.

RNarayan73 closed this as completed Oct 19, 2021

RNarayan73 reopened this Oct 19, 2021

RNarayan73 closed this as completed Oct 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError: `max_samples` must be in range (0, 1) but got value 1.0 for random forest #1086

ValueError: `max_samples` must be in range (0, 1) but got value 1.0 for random forest #1086

Uh oh!

Uh oh!

ValueError: max_samples must be in range (0, 1) but got value 1.0 for random forest #1086

ValueError: max_samples must be in range (0, 1) but got value 1.0 for random forest #1086

Comments

Uh oh!

Uh oh!

Uh oh!

ValueError: `max_samples` must be in range (0, 1) but got value 1.0 for random forest #1086

ValueError: `max_samples` must be in range (0, 1) but got value 1.0 for random forest #1086