8000 SLEP006 - metadata handling: fit, transform, fit_transform · Issue #22987 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

SLEP006 - metadata handling: fit, transform, fit_transform #22987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #22893
adrinjalali opened this issue Mar 29, 2022 · 4 comments
Closed
Tracked by #22893

SLEP006 - metadata handling: fit, transform, fit_transform #22987

adrinjalali opened this issue Mar 29, 2022 · 4 comments

Comments

@adrinjalali
Copy link
Member

The question of how to handle metadata routing in fit_transform has come up a few times and we don't have a clear solution to it yet. This is the latest conversation: https://github.com/scikit-learn/scikit-learn/pull/22083/files#r811448325

A nice summery made by @lorentzenchr is:

  1. Handle fit_transform as separate method with its own set_fit_transform_requests.
  2. Merge the requests of fit and transform (error if inconsistent)
  3. Distinguish between fit_transform that only calls .fit(X).transform(X) and the rest (where it does something meaningful).

Option 1 is the easiest to implement, but it gets confusing when users put transformers in a pipeline. The user wouldn't necessarily know which meta-estimator does fit().transform() and which meta-estimator would do .fit_transform().

Option 2 is doable by adding machinery in MetadataRequest and MetadataRouter objects. We can have a common test which makes sure fit_transform always accepts fit_requests | transform_requests.

Option 3 requires change in the base.py and probably leads to much more discussions than the other 2 options.

also cc @jnothman @agramfort

@github-actions github-actions bot added the Needs Triage Issue requires triage label Mar 29, 2022
@thomasjpfan thomasjpfan added Needs Decision - API and removed Needs Triage Issue requires triage labels Mar 29, 2022
@jnothman
Copy link
Member
8000 jnothman commented Apr 3, 2022

I favour option 2 personally. Certainly in the current codebase, we're only really talking about parameters to fit; and the parameters that we imagine for transform would either be disjoint or be identical in value (i.e. routing the same input to the same name) to those in fit, so merging makes some sense.

@thomasjpfan
Copy link
Member

I prefer merging fit and transform. (Option 2)

@lorentzenchr
Copy link
Member
lorentzenchr commented Apr 4, 2022

I also favor option 2, merging metadata routing for fit and transform.

@adrinjalali
Copy link
Member Author

Fixed in #26506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0