-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Allow only fit_transform to be present in pipeline #16714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This makes it possible to plug in algorithms like TSNE in the pipeline, although they do not support `transform`. A warnign will be raised that the pipeline is not reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It would be nice if you could add some tests in the sklearn/tests/test_pipeline.py
file to make sure that the warning and the error are raised in relevant cases.
8000
Sure! Didn't see that earlier. |
According to scikit-learn#16714 (comment) changed the syntax but not semantics
According to the style guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also change the test for FeatureUnion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Should be taken care of by the user
flake8 E127
I'm a bit unsure why the doc test failed, since I didn't change anything there. Could it be related to #17051? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging with master may fix your issue.
Update doc/whats_new/v0.23.rst Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>
The CI crashed again, but now everything seems to be working. Let me know if anything needs fixing. |
ping @cmarmo this might be worth a 'waiting for reviewer' tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my only concern here is someone training a pipeline with a tsne in the middle, expecting that they'll then be able to predict with it. They wait an hour for the model to fit and then ... bam! No predictions for you. I don't want to waste a user's time like that.
The only way I can see around that is to have a parameter control whether this feature is available, or going further, to introduce a TransductivePipeline.
Pipeline is very much designed to solve needs around train/test setups, so I don't want to weaken its usability for that use-case.
Your thoughts?
Yes. |
Persons interested in such a usage pattern should open an issue for discussion (not me). Also pinging @cathelineparisot. |
This makes it possible to plug in algorithms like TSNE in the
pipeline, although they do not support
transform
. A warning will beraised that the pipeline is not reusable.
Reference Issues/PRs
Fixes #16710
What does this implement/fix? Explain your changes.
This allows algorithms that only implement a
fit_transform
function, but nottransform
(such as TSNE) to be used in a Pipeline in sklearn.