-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator #30171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks simple enough to me, but I'm by no means an expert in this area of the codebase :p
Co-authored-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this: asking the user to import a class and wrap an argument feels like a small usability regression.
What about not deprecating cv="prefit"
and automatically do the wrapping with FrozenEstimator
on the fly in _get_estimator
:
def _get_estimator(self):
"""Resolve which estimator to return (default is LinearSVC)"""
if self.cv == "prefit" and self.estimator is None:
raise ValueError(
'estimator cannot be left to None when cv="prefit"'
)
if self.estimator is None:
# we want all classifiers that don't expose a random_state
# to be deterministic (and we don't want to expose this one).
estimator = LinearSVC(random_state=0)
if _routing_enabled():
estimator.set_fit_request(sample_weight=True)
else:
estimator = self.estimator
if self.cv == "prefit":
estimator = FrozenEstimator(estimator)
return estimator
But then we would have too equivalent ways to do the same thing:
from sklearn.frozen import FrozenEstimator
CalibratedClassifierCV(FrozenEstimator(model)).fit(X_cal, y_cal)
and:
CalibratedClassifierCV(model, cv="prefit").fit(X_cal, y_cal)
But maybe that's ok?
I wouldn't say it's a usability regression. |
I partially agree... But at the same time, calling In that respect, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @thomasjpfan and others think it's better, and I am in the minority, then so be it. It's true that this will also allow a small code simplification once the deprecation period is over.
@@ -540,7 +554,7 @@ def test_calibration_dict_pipeline(dict_data, dict_data_pipeline): | |||
""" | |||
X, y = dict_data | |||
clf = dict_data_pipeline | |||
calib_clf = CalibratedClassifierCV(clf, cv="prefit") | |||
calib_clf = CalibratedClassifierCV(FrozenEstimator(clf), cv=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change highlights another small usability regression: it's difficult to understand why we would need to adjust the CV strategy when calibrating a frozen estimator on a dataset with less than 5 data-points per class. One could expect that cross-fitting would be entirely disabled in that case.
Again, I can live with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree with you, but in reality, people with 5 data points shouldn't be doing statistical ML 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, meaningful calibration with 5 data points per class is quite challenging. Still, it reveals a rough edge in my opinion. But I believe we can fix it later if needed.
Towards #29893
This deprecates
CalibratedClassifierCV(..., cv="prefit")
in favor ofCalibratedClassifierCV(FrozenEstimator(est))
.Also adds
ensemble="auto"
option for automatically handlingFrozenEstimator
correctly.cc @glemaitre @ogrisel