8000 `cross_val_score` crashes with `StackingRegressor` · Issue #24430 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
cross_val_score crashes with StackingRegressor #24430
Closed
@GuillemGSubies

Description

@GuillemGSubies

Describe the bug

I'm trying to make a simple stacking and getting the cross validation score but an error raises:

NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Steps/Code to Reproduce

from sklearn.ensemble import StackingRegressor, RandomForestRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
X, y = make_regression()
rf = RandomForestRegressor(n_jobs=-1, random_state=42)
rf.fit(X, y)
stack = StackingRegressor([("rf", rf)], cv="prefit")
cross_val_score(estimator=stack, X=X, y=y, scoring="neg_mean_absolute_error", cv=5, n_jobs=-1, error_score="raise")

Expected Results

It should work fine if I am understanding everything right.

Actual Results

NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Full traceback < 944A div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="--------------------------------------------------------------------------- _RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker r = call_item() File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__ return self.fn(*self.args, **self.kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 595, in __call__ return self.func(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in __call__ return [func(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in <listcomp> return [func(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/fixes.py", line 117, in __call__ return self.function(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 879, in fit return super().fit(X, y, sample_weight) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 189, in fit check_is_fitted(estimator) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1352, in check_is_fitted raise NotFittedError(msg % {"name": type(estimator).__name__}) sklearn.exceptions.NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. """ The above exception was the direct cause of the following exception: NotFittedError Traceback (most recent call last) Cell In [3], line 8 6 rf.fit(X, y) 7 stack = StackingRegressor([("rf", rf)], cv="prefit") ----> 8 cross_val_score(estimator=stack, X=X, y=y, scoring="neg_mean_absolute_error", cv=5, n_jobs=-1, error_score="raise") File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:515, in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score) 512 # To ensure multimetric format is not supported 513 scorer = check_scoring(estimator, scoring=scoring) --> 515 cv_results = cross_validate( 516 estimator=estimator, 517 X=X, 518 y=y, 519 groups=groups, 520 scoring={"score": scorer}, 521 cv=cv, 522 n_jobs=n_jobs, 523 verbose=verbose, 524 fit_params=fit_params, 525 pre_dispatch=pre_dispatch, 526 error_score=error_score, 527 ) 528 return cv_results["test_score"] File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:266, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score) 263 # We clone the estimator to make sure that all the folds are 264 # independent, and that it is pickle-able. 265 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch) --> 266 results = parallel( 267 delayed(_fit_and_score)( 268 clone(estimator), 269 X, 270 y, 271 scorers, 272 train, 273 test, 274 verbose, 275 None, 276 fit_params, 277 return_train_score=return_train_score, 278 return_times=True, 279 return_estimator=return_estimator, 280 error_score=error_score, 281 ) 282 for train, test in cv.split(X, y, groups) 283 ) 285 _warn_or_raise_about_fit_failures(results, error_score) 287 # For callabe scoring, the return type is only know after calling. If the 288 # return type is a dictionary, the error scores can now be inserted with 289 # the correct key. File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:1056, in Parallel.__call__(self, iterable) 1053 self._iterating = False 1055 with self._backend.retrieval_context(): -> 1056 self.retrieve() 1057 # Make sure that we get a last message telling us we are done 1058 elapsed_time = time.time() - self._start_time File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:935, in Parallel.retrieve(self) 933 try: 934 if getattr(self._backend, 'supports_timeout', False): --> 935 self._output.extend(job.get(timeout=self.timeout)) 936 else: 937 self._output.extend(job.get()) File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py:542, in LokyBackend.wrap_future_result(future, timeout) 539 """Wrapper for Future.result to implement the same behaviour as 540 AsyncResults.get from multiprocessing.""" 541 try: --> 542 return future.result(timeout=timeout) 543 except CfTimeoutError as e: 544 raise TimeoutError from e File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:446, in Future.result(self, timeout) 444 raise CancelledError() 445 elif self._state == FINISHED: --> 446 return self.__get_result() 447 else: 448 raise TimeoutError() File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:391, in Future.__get_result(self) 389 if self._exception: 390 try: --> 391 raise self._exception 392 finally: 393 # Break a reference cycle with the exception in self._exception 394 self = None NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.">
---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
  r = call_item()
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
  return self.fn(*self.args, **self.kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 595, in __call__
  return self.func(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in __call__
  return [func(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in <listcomp>
  return [func(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/fixes.py", line 117, in __call__
  return self.function(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
  estimator.fit(X_train, y_train, **fit_params)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 879, in fit
  return super().fit(X, y, sample_weight)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 189, in fit
  check_is_fitted(estimator)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1352, in check_is_fitted
  raise NotFittedError(msg % {"name": type(estimator).__name__})
sklearn.exceptions.NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
"""

The above exception was the direct cause of the following exception:

NotFittedError                            Traceback (most recent call last)
Cell In [3], line 8
    6 rf.fit(X, y)
    7 stack = StackingRegressor([("rf", rf)], cv="prefit")
----> 8 cross_val_score(estimator=stack, X=X, y=y, scoring="neg_mean_absolute_error", cv=5, n_jobs=-1, error_score="raise")

File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:515, in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
  512 # To ensure multimetric format is not supported
  513 scorer = check_scoring(estimator, scoring=scoring)
--> 515 cv_results = cross_validate(
  516     estimator=estimator,
  517     X=X,
  518     y=y,
  519     groups=groups,
  520     scoring={"score": scorer},
  521     cv=cv,
  522     n_jobs=n_jobs,
  523     verbose=verbose,
  524     fit_params=fit_params,
  525     pre_dispatch=pre_dispatch,
  526     error_score=error_score,
  527 )
  528 return cv_results["test_score"]

File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:266, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)
  263 # We clone the estimator to make sure that all the folds are
  264 # independent, and that it is pickle-able.
  265 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch)
--> 266 results = parallel(
  267     delayed(_fit_and_score)(
  268         clone(estimator),
  269         X,
  270         y,
  271         scorers,
  272         train,
  273         test,
  274         verbose,
  275         None,
  276         fit_params,
  277         return_train_score=return_train_score,
  278         return_times=True,
  279         return_estimator=return_estimator,
  280         error_score=error_score,
  281     )
  282     for train, test in cv.split(X, y, groups)
  283 )
  285 _warn_or_raise_about_fit_failures(results, error_score)
  287 # For callabe scoring, the return type is only know after calling. If the
  288 # return type is a dictionary, the error scores can now be inserted with
  289 # the correct key.

File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:1056, in Parallel.__call__(self, iterable)
 1053     self._iterating = False
 1055 with self._backend.retrieval_context():
-> 1056     self.retrieve()
 1057 # Make sure that we get a last message telling us we are done
 1058 elapsed_time = time.time() - self._start_time

File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:935, in Parallel.retrieve(self)
  933 try:
  934     if getattr(self._backend, 'supports_timeout', False):
--> 935         self._output.extend(job.get(timeout=self.timeout))
  936     else:
  937         self._output.extend(job.get())

File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py:542, in LokyBackend.wrap_future_result(future, timeout)
  539 """Wrapper for Future.result to implement the same behaviour as
  540 AsyncResults.get from multiprocessing."""
  541 try:
--> 542     return future.result(timeout=timeout)
  543 except CfTimeoutError as e:
  544     raise TimeoutError from e

File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:446, in Future.result(self, timeout)
  444     raise CancelledError()
  445 elif self._state == FINISHED:
--> 446     return self.__get_result()
  447 else:
  448     raise TimeoutError()

File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:391, in Future.__get_result(self)
  389 if self._exception:
  390     try:
--> 391         raise self._exception
  392     finally:
  393         # Break a reference cycle with the exception in self._exception
  394         self = None

NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Versions

I installed the latest nightly


System:
    python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]
executable: /home/guillem.garcia/miniconda3/envs/skbug/bin/python
   machine: Linux-5.15.0-47-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.2.dev0
          pip: 22.1.2
   setuptools: 63.4.1
        numpy: 1.24.0.dev0+703.gb2fbc4349
        scipy: 1.10.0.dev0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so
        version: 0.3.20
threading_layer: pthreads
   architecture: Haswell
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell
    num_threads: 8

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0