GridSearchCV does not seem to recognize whether estimators from StackingClassifier are fitted or not · Issue #24409 · scikit-learn/scikit-learn · GitHub
You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be a bug with the combination of GridSearchCV and StackingClassifier when the parameter cv of StackingClassifier is set to 'prefit'. With this option, the estimators of the StackingClassifier should be fitted before fitting the stacked model, and only the final_estimator would then be fitted. When including the StackingClassifier within GridSearchCV however, the fact that estimators have already been fitted does not seem to be recognized.
Steps/Code to Reproduce
importnumpyasnpfromsklearn.linear_modelimportLogisticRegressionfromsklearn.naive_bayesimportGaussianNBfromsklearn.ensembleimportGradientBoostingClassifier, StackingClassifierfromsklearn.model_selectionimportGridSearchCV# Creating toy data setn_features=3n_instances=40train=np.random.rand(n_instances, n_features)
label=np.random.randint(0,2,n_instances)
# Declaring estimatorslog_clf=LogisticRegression()
gau_clf=GaussianNB()
# Fitting estimatorslog_clf.fit(train, label)
gau_clf.fit(train, label)
# Creating stacked modelestimators= [
("log", log_clf),
("gau", gau_clf)
]
boost=GradientBoostingClassifier()
stack=StackingClassifier(estimators=estimators,
final_estimator=boost,
cv='prefit')
# Creating the Grid CVparam_search= {
'final_estimator__max_depth': [1]
}
gridcv=GridSearchCV(stack,
param_grid=param_search)
# Fitting the stack and gridcv modelsstack.fit(train, label) # works fine gridcv.fit(train, label) # sklearn.exceptions.NotFittedError: This LogisticRegression instance is not fitted yet. # Call 'fit' with appropriate arguments before using this estimator
Expected Results
In the above code, the stack model works fine and no error related to the estimators' previous fitting is thrown.
However, when included in the GridSearchCV, the fact that estimator models have been fitted already does not seem to be recognized and a sklearn.exceptions.NotFittedError error is prompted.
Actual Results
Traceback (most recent call last):
File "c:\Users\levesque\Documents\Python\AMEX\errorExample.py", line 39, in <module>
gridcv.fit(train, label) # sklearn.exceptions.NotFittedError: This LogisticRegression instance is not fitted yet. Call 'fit' with appropriate
arguments before using this estimator
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\model_selection\_search.py", line 875, in fit
self._run_search(evaluate_candidates)
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\model_selection\_search.py", line 1379, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\model_selection\_search.py", line 852, in evaluate_candidates
_warn_or_raise_about_fit_failures(out, self.error_score)
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\model_selection\_validation.py", line 367, in _warn_or_raise_about_fit_failures
raise ValueError(all_fits_failed_message)
ValueError:
All the 5 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\ensemble\_stacking.py", line 584, in fit
return super().fit(X, self._le.transform(y), sample_weight)
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\ensemble\_stacking.py", line 183, in fit
check_is_fitted(estimator)
File "C:\Users\levesque\Documents\Python\AMEX\env\lib\site-packages\sklearn\utils\validation.py", line 1345, in check_is_fitted
raise NotFittedError(msg % {"name": type(estimator).__name__})
sklearn.exceptions.NotFittedError: This LogisticRegression instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
I believe this is because the search clones its estimator stack, and that in turn clones the base stack.estimators so that indeed they are not fitted when each hyperparameter setting attempts to fit. This is called out (indirectly) in the Glossary entry for metaestimators:
In a meta-estimator’s fit method, any contained estimators should be cloned before they are fit (although FIXME: Pipeline and FeatureUnion do not do this currently). An exception to this is that an estimator may explicitly document that it accepts a pre-fitted estimator (e.g. using prefit=True in feature_selection.SelectFromModel). One known issue with this is that the pre-fitted estimator will lose its model if the meta-estimator is cloned.
Is there precedent for overriding clone behavior by estimator, or would clone itself need a tweak to accommodate leaving base estimators along when cv='prefit'? There are other prefit options around: CalibratedClassifierCV (not a likely candidate for prefit and cloning/searching, I'd guess) and SelectFromModel (where the parameter is prefit [bool] as opposed to cv='prefit' as here).
Indeed, the issue comes from cloning where any nested estimator will be cloned regardless of some parameters of the meta-estimator. Here, the cv="prefit" should indicate to the cloning that it should not try to clone the estimators.
Describe the bug
There seems to be a bug with the combination of
GridSearchCV
andStackingClassifier
when the parametercv
ofStackingClassifier
is set to 'prefit'. With this option, the estimators of theStackingClassifier
should be fitted before fitting the stacked model, and only the final_estimator would then be fitted. When including theStackingClassifier
withinGridSearchCV
however, the fact that estimators have already been fitted does not seem to be recognized.Steps/Code to Reproduce
Expected Results
In the above code, the
stack
model works fine and no error related to the estimators' previous fitting is thrown.However, when included in the GridSearchCV, the fact that estimator models have been fitted already does not seem to be recognized and a
sklearn.exceptions.NotFittedError
error is prompted.Actual Results
Versions
The text was updated successfully, but these errors were encountered: