FEA Add metadata routing through predict methods of BaggingClassifier and BaggingRegressor#30833
Conversation
| ) | ||
| y_proba = np.empty(shape=(len(X), 2)) | ||
| y_proba[: len(X) // 2, :] = np.asarray([1.0, 0.0]) | ||
| y_proba[len(X) // 2 :, :] = np.asarray([0.0, 1.0]) | ||
| y_proba = np.empty(shape=(len(X), len(self.classes_)), dtype=np.float32) | ||
| # each row sums up to 1.0: | ||
| y_proba[:] = np.random.dirichlet(alpha=np.ones(len(self.classes_)), size=len(X)) | ||
| return y_proba |
There was a problem hiding this comment.
It is necessary to make sure predict_proba and predict_log_proba return a column per class for all the test classes (ConsumingClassifier and NonConsumingClassifier) to avoid shape mismatch while testing.
There was a problem hiding this comment.
A few comments otherwise looks good
There was a problem hiding this comment.
Thank you for your review, @OmarManzoor!
I have added a comment on the conditions for the router.
There was a problem hiding this comment.
LGTM. Thanks @StefanieSenger
Co-authored-by: Omar Salman <omar.salman2007@gmail.com>
There was a problem hiding this comment.
Overall looks good, but a few things need fixing. This is not a complete review. Please ping me once you're ready for another round.
There was a problem hiding this comment.
Thank you for reviewing, @adrinjalali!
I have changed the naming of the routed kwargs into **params, added documentation on the routing depending on which methods are available in the sub-estimators, moved the changelog entry into the correct section, re-defined the router depending on whether predict_log_proba is available and added corresponding test cases.
This PR is ready for another round of reviewing.
| # `sample_weight` is passed to the respective methods dynamically at | ||
| # runtime: | ||
| if hasattr(self._get_estimator(), "predict_proba"): | ||
| method_mapping.add(caller="predict_log_proba", callee="predict_proba") |
There was a problem hiding this comment.
seems like there's no test case for this one
There was a problem hiding this comment.
That is because we need to test it with a classifier that does not have predict_log_proba, but has predict_proba.
I have added an "outsourced" test just as I did for the case when the sub-classifier doesn't have predict_proba.
sklearn/ensemble/_bagging.py
Outdated
| routed_params = process_routing(self, "predict_proba", **params) | ||
| else: | ||
| routed_params = Bunch() | ||
| routed_params.estimator = Bunch(predict_proba=params) |
There was a problem hiding this comment.
we don't really support routing metadata around if metadata routing is disabled anyway (the _raise_for_params above). So here the Bunch can/should be empty.
sklearn/ensemble/_bagging.py
Outdated
| **params : dict | ||
| Parameters routed to the `predict_log_proba`, the `predict_proba` or the | ||
| `proba` method of the sub-estimators via the metadata routing API. The | ||
| routing is tried in the before mentioned order depending on whether this |
There was a problem hiding this comment.
| routing is tried in the before mentioned order depending on whether this | |
| routing is tried in the mentioned order depending on whether this |
sklearn/ensemble/_bagging.py
Outdated
| routed_params = process_routing(self, "predict_log_proba", **params) | ||
| else: | ||
| routed_params = Bunch() | ||
| routed_params.estimator = Bunch(predict_log_proba=params) |
There was a problem hiding this comment.
same here with not routing params
sklearn/ensemble/_bagging.py
Outdated
| routed_params = process_routing(self, "decision_function", **params) | ||
| else: | ||
| routed_params = Bunch() | ||
| routed_params.estimator = Bunch(decision_function=params) |
sklearn/ensemble/_bagging.py
Outdated
| routed_params = process_routing(self, "predict", **params) | ||
| else: | ||
| routed_params = Bunch() | ||
| routed_params.estimator = Bunch(predict=params) |
There was a problem hiding this comment.
Thanks for your review, @adrinjalali.
I have removed all the passed params from the bunches without metadata routing and added test cases and new test classes for dynamic method selection.
| # `sample_weight` is passed to the respective methods dynamically at | ||
| # runtime: | ||
| if hasattr(self._get_estimator(), "predict_proba"): | ||
| method_mapping.add(caller="predict_log_proba", callee="predict_proba") |
There was a problem hiding this comment.
That is because we need to test it with a classifier that does not have predict_log_proba, but has predict_proba.
I have added an "outsourced" test just as I did for the case when the sub-classifier doesn't have predict_proba.
| @@ -0,0 +1,4 @@ | |||
| - :class:`ensemble.BaggingClassifier` and :class:`ensemble.BaggingRegressor` now support | |||
| metadata routing through their `predict`, `predict_proba`, `predict_log_proba` and | |||
| `decision_function` methods and pass `**predict_params` to the underlying estimators. | |||
There was a problem hiding this comment.
| `decision_function` methods and pass `**predict_params` to the underlying estimators. | |
| `decision_function` methods and pass `**params` to the underlying estimators. |
sklearn/ensemble/_bagging.py
Outdated
| reset=False, | ||
| ) | ||
|
|
||
| _raise_for_params(params, self, "predict") |
There was a problem hiding this comment.
I'd put all these _raise_form_params calls as the first thing in these methods, so that the user knows as soon as possible that they need to change their script, instead of fitting and fixing their data, and then suddenly getting this error.
There was a problem hiding this comment.
Yes, that makes sense, especially if fitting takes a while. I have put all the _raise_for_params() first.
| ) | ||
| bagging = BaggingClassifier(estimator=estimator) | ||
| bagging.fit(X, y) | ||
| bagging.predict(X=np.array([[1, 1], [1, 3], [0, 2]])) |
There was a problem hiding this comment.
I think the reason we still have a codecov warning is that here you're not calling predict_log_proba and/or predict_proba. I think we need to test all the 4 methods, and also need to pass metadata here to actually record something and activate the routing mechanism.
There was a problem hiding this comment.
Fixed in commit a3acb0. Thanks for your help with it!
Reference Issues/PRs
closes #30808
towards #22893
What does this implement/fix? Explain your changes.
Add metadata routing functionality to
BaggingClassifierandBaggingRegressor'spredict,predict_probapredict_log_probaanddecision_functionmethods.CC @adrinjalali @OmarManzoor
Would you like to have a look?