BayesSearchCV multimetric scoring inconsistency: KeyError: 'mean_test_score' #1105

leweex95 · 2022-02-06T16:38:09Z

I am working on a multilabel classification and for now I have primarily been relying on RandomizedSearchCV from scikit-learn to perform hyperparameter optimization. I now started experimenting with BayesSearchCV and ran into a potential bug when using multi-metric scoring, combined with the refit argument.

I created a full reproducible toy example below.

Imports, data generation, pipeline:

import numpy as np
from sklearn.datasets import make_multilabel_classification
from sklearn.naive_bayes import MultinomialNB
from sklearn.multioutput import MultiOutputClassifier
from sklearn.model_selection import RandomizedSearchCV
from skopt.searchcv import BayesSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

X, Y = make_multilabel_classification(
    n_samples=10000,
    n_features=20,
    n_classes=10,
    n_labels=3
)

pipe = Pipeline(
    steps = [
        ('scaler', MinMaxScaler()),
        ('model', MultiOutputClassifier(MultinomialNB()))
    ]
)

Step 1: Single metric with refit = True : works!

search = BayesSearchCV(
    estimator = pipe, 
    search_spaces = {'model__estimator__alpha': np.linspace(0.01,1,50)},
    scoring = 'precision_macro',
    refit = True,
    cv = 5
).fit(X, Y)

Step 2: Single metric with refit = 'precision_macro': works!

search = BayesSearchCV(
    estimator = pipe, 
    search_spaces = {'model__estimator__alpha': np.linspace(0.01,1,50)},
    scoring = 'precision_macro',
    refit = 'precision_macro',
    cv = 5
).fit(X, Y)

Step 3: Multiple metrics with refit = 'precision_macro': fails!

search = BayesSearchCV(
    estimator = pipe, 
    search_spaces = {'model__estimator__alpha': np.linspace(0.01,1,50)},
    scoring = ['precision_macro', 'recall_macro', 'accuracy'],
    refit = 'precision_macro',
    cv = 5
).fit(X, Y)

(Note: adding return_train_score=True to BayesSearchCV() didn't make a difference.)

The error message is:

  File "multioutput.py", line 46, in <module>
    ).fit(X, Y)
  File ".venv\lib\site-packages\skopt\searchcv.py", line 466, in fit
    super().fit(X=X, y=y, groups=groups, **fit_params)
  File ".venv\lib\site-packages\sklearn\model_selection\_search.py", line 891, in fit  
    self._run_search(evaluate_candidates)
  File ".venv\lib\site-packages\skopt\searchcv.py", line 514, in _run_search
    evaluate_candidates, n_points=n_points_adjusted
  File ".venv\lib\site-packages\skopt\searchcv.py", line 411, in _step
    local_results = all_results["mean_test_score"][-len(params):]
KeyError: 'mean_test_score'

For comparison, I ran the same setup through RandomizedSearchCV from scikit-learn:

search = RandomizedSearchCV(
    estimator = pipe,
    param_distributions = {'model__estimator__alpha': np.linspace(0.01,1,50)},
    scoring = ['precision_macro', 'recall_macro', 'accuracy'],
    refit = 'precision_macro',
    cv = 5
).fit(X, Y)

and it evaluated correctly.

My version of scikit-optimize: 0.9.0.
OS: Windows 10

The text was updated successfully, but these errors were encountered:

RNarayan73 · 2022-04-06T20:05:41Z

@leweex95
This may be addressed by an unofficial fork here #1071
I use it too. Hope this helps.
Narayan

epenney1 · 2023-02-16T09:40:13Z

Hi, this is still an issue for me as well. Could you point me to where exactly I should download a version without this fault? Thanks.

RNarayan73 · 2023-02-18T15:51:17Z

Hi, this is still an issue for me as well. Could you point me to where exactly I should download a version without this fault? Thanks.

@epenney1 see link #1071 which has a pip install command for the unofficial version

Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BayesSearchCV multimetric scoring inconsistency: KeyError: 'mean_test_score' #1105

BayesSearchCV multimetric scoring inconsistency: KeyError: 'mean_test_score' #1105

Uh oh!

Uh oh!

Uh oh!

BayesSearchCV multimetric scoring inconsistency: KeyError: 'mean_test_score' #1105

BayesSearchCV multimetric scoring inconsistency: KeyError: 'mean_test_score' #1105

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!