Closed
Description
Describe the bug
I'm trying to make a simple stacking and getting the cross validation score but an error raises:
NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Steps/Code to Reproduce
from sklearn.ensemble import StackingRegressor, RandomForestRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
X, y = make_regression()
rf = RandomForestRegressor(n_jobs=-1, random_state=42)
rf.fit(X, y)
stack = StackingRegressor([("rf", rf)], cv="prefit")
cross_val_score(estimator=stack, X=X, y=y, scoring="neg_mean_absolute_error", cv=5, n_jobs=-1, error_score="raise")
Expected Results
It should work fine if I am understanding everything right.
Actual Results
NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Full traceback
< 944A div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="--------------------------------------------------------------------------- _RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker r = call_item() File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__ return self.fn(*self.args, **self.kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 595, in __call__ return self.func(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in __call__ return [func(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in <listcomp> return [func(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/fixes.py", line 117, in __call__ return self.function(*args, **kwargs) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 879, in fit return super().fit(X, y, sample_weight) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 189, in fit check_is_fitted(estimator) File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1352, in check_is_fitted raise NotFittedError(msg % {"name": type(estimator).__name__}) sklearn.exceptions.NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. """ The above exception was the direct cause of the following exception: NotFittedError Traceback (most recent call last) Cell In [3], line 8 6 rf.fit(X, y) 7 stack = StackingRegressor([("rf", rf)], cv="prefit") ----> 8 cross_val_score(estimator=stack, X=X, y=y, scoring="neg_mean_absolute_error", cv=5, n_jobs=-1, error_score="raise") File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:515, in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score) 512 # To ensure multimetric format is not supported 513 scorer = check_scoring(estimator, scoring=scoring) --> 515 cv_results = cross_validate( 516 estimator=estimator, 517 X=X, 518 y=y, 519 groups=groups, 520 scoring={"score": scorer}, 521 cv=cv, 522 n_jobs=n_jobs, 523 verbose=verbose, 524 fit_params=fit_params, 525 pre_dispatch=pre_dispatch, 526 error_score=error_score, 527 ) 528 return cv_results["test_score"] File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:266, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score) 263 # We clone the estimator to make sure that all the folds are 264 # independent, and that it is pickle-able. 265 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch) --> 266 results = parallel( 267 delayed(_fit_and_score)( 268 clone(estimator), 269 X, 270 y, 271 scorers, 272 train, 273 test, 274 verbose, 275 None, 276 fit_params, 277 return_train_score=return_train_score, 278 return_times=True, 279 return_estimator=return_estimator, 280 error_score=error_score, 281 ) 282 for train, test in cv.split(X, y, groups) 283 ) 285 _warn_or_raise_about_fit_failures(results, error_score) 287 # For callabe scoring, the return type is only know after calling. If the 288 # return type is a dictionary, the error scores can now be inserted with 289 # the correct key. File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:1056, in Parallel.__call__(self, iterable) 1053 self._iterating = False 1055 with self._backend.retrieval_context(): -> 1056 self.retrieve() 1057 # Make sure that we get a last message telling us we are done 1058 elapsed_time = time.time() - self._start_time File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:935, in Parallel.retrieve(self) 933 try: 934 if getattr(self._backend, 'supports_timeout', False): --> 935 self._output.extend(job.get(timeout=self.timeout)) 936 else: 937 self._output.extend(job.get()) File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py:542, in LokyBackend.wrap_future_result(future, timeout) 539 """Wrapper for Future.result to implement the same behaviour as 540 AsyncResults.get from multiprocessing.""" 541 try: --> 542 return future.result(timeout=timeout) 543 except CfTimeoutError as e: 544 raise TimeoutError from e File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:446, in Future.result(self, timeout) 444 raise CancelledError() 445 elif self._state == FINISHED: --> 446 return self.__get_result() 447 else: 448 raise TimeoutError() File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:391, in Future.__get_result(self) 389 if self._exception: 390 try: --> 391 raise self._exception 392 finally: 393 # Break a reference cycle with the exception in self._exception 394 self = None NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.">---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
r = call_item()
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
return self.fn(*self.args, **self.kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 879, in fit
return super().fit(X, y, sample_weight)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/ensemble/_stacking.py", line 189, in fit
check_is_fitted(estimator)
File "/home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1352, in check_is_fitted
raise NotFittedError(msg % {"name": type(estimator).__name__})
sklearn.exceptions.NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
"""
The above exception was the direct cause of the following exception:
NotFittedError Traceback (most recent call last)
Cell In [3], line 8
6 rf.fit(X, y)
7 stack = StackingRegressor([("rf", rf)], cv="prefit")
----> 8 cross_val_score(estimator=stack, X=X, y=y, scoring="neg_mean_absolute_error", cv=5, n_jobs=-1, error_score="raise")
File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:515, in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
512 # To ensure multimetric format is not supported
513 scorer = check_scoring(estimator, scoring=scoring)
--> 515 cv_results = cross_validate(
516 estimator=estimator,
517 X=X,
518 y=y,
519 groups=groups,
520 scoring={"score": scorer},
521 cv=cv,
522 n_jobs=n_jobs,
523 verbose=verbose,
524 fit_params=fit_params,
525 pre_dispatch=pre_dispatch,
526 error_score=error_score,
527 )
528 return cv_results["test_score"]
File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:266, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)
263 # We clone the estimator to make sure that all the folds are
264 # independent, and that it is pickle-able.
265 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch)
--> 266 results = parallel(
267 delayed(_fit_and_score)(
268 clone(estimator),
269 X,
270 y,
271 scorers,
272 train,
273 test,
274 verbose,
275 None,
276 fit_params,
277 return_train_score=return_train_score,
278 return_times=True,
279 return_estimator=return_estimator,
280 error_score=error_score,
281 )
282 for train, test in cv.split(X, y, groups)
283 )
285 _warn_or_raise_about_fit_failures(results, error_score)
287 # For callabe scoring, the return type is only know after calling. If the
288 # return type is a dictionary, the error scores can now be inserted with
289 # the correct key.
File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:1056, in Parallel.__call__(self, iterable)
1053 self._iterating = False
1055 with self._backend.retrieval_context():
-> 1056 self.retrieve()
1057 # Make sure that we get a last message telling us we are done
1058 elapsed_time = time.time() - self._start_time
File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/parallel.py:935, in Parallel.retrieve(self)
933 try:
934 if getattr(self._backend, 'supports_timeout', False):
--> 935 self._output.extend(job.get(timeout=self.timeout))
936 else:
937 self._output.extend(job.get())
File ~/miniconda3/envs/skbug/lib/python3.10/site-packages/joblib/_parallel_backends.py:542, in LokyBackend.wrap_future_result(future, timeout)
539 """Wrapper for Future.result to implement the same behaviour as
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:446, in Future.result(self, timeout)
444 raise CancelledError()
445 elif self._state == FINISHED:
--> 446 return self.__get_result()
447 else:
448 raise TimeoutError()
File ~/miniconda3/envs/skbug/lib/python3.10/concurrent/futures/_base.py:391, in Future.__get_result(self)
389 if self._exception:
390 try:
--> 391 raise self._exception
392 finally:
393 # Break a reference cycle with the exception in self._exception
394 self = None
NotFittedError: This RandomForestRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
Versions
I installed the latest nightly
System:
python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]
executable: /home/guillem.garcia/miniconda3/envs/skbug/bin/python
machine: Linux-5.15.0-47-generic-x86_64-with-glibc2.35
Python dependencies:
sklearn: 1.2.dev0
pip: 22.1.2
setuptools: 63.4.1
numpy: 1.24.0.dev0+703.gb2fbc4349
scipy: 1.10.0.dev0
Cython: None
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
prefix: libgomp
filepath: /home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
version: None
num_threads: 8
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-742d56dc.3.20.so
version: 0.3.20
threading_layer: pthreads
architecture: Haswell
num_threads: 8
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /home/guillem.garcia/miniconda3/envs/skbug/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
version: 0.3.18
threading_layer: pthreads
architecture: Haswell
num_threads: 8