-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
SLEP006 - metadata handling: fit, transform, fit_transform #22987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I favour option 2 personally. Certainly in the current codebase, we're only really talking about parameters to |
I prefer merging |
I also favor option 2, merging metadata routing for |
Fixed in #26506 |
The question of how to handle metadata routing in
fit_transform
has come up a few times and we don't have a clear solution to it yet. This is the latest conversation: https://github.com/scikit-learn/scikit-learn/pull/22083/files#r811448325A nice summery made by @lorentzenchr is:
fit_transform
as separate method with its ownset_fit_transform_requests
.fit
andtransform
(error if inconsistent)fit_transform
that only calls.fit(X).transform(X)
and the rest (where it does something meaningful).Option 1 is the easiest to implement, but it gets confusing when users put transformers in a pipeline. The user wouldn't necessarily know which meta-estimator does
fit().transform()
and which meta-estimator would do.fit_transform()
.Option 2 is doable by adding machinery in
MetadataRequest
andMetadataRouter
objects. We can have a common test which makes surefit_transform
always acceptsfit_requests | transform_requests
.Option 3 requires change in the
base.py
and probably leads to much more discussions than the other 2 options.also cc @jnothman @agramfort
The text was updated successfully, but these errors were encountered: