10000 [MRG] Allow only fit_transform to be present in pipeline by jnboehm · Pull Request #16714 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Allow only fit_transform to be present in pipeline #16714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

jnboehm
Copy link
Contributor
@jnboehm jnboehm commented Mar 17, 2020

This makes it possible to plug in algorithms like TSNE in the
pipeline, although they do not support transform. A warning will be
raised that the pipeline is not reusable.

Reference Issues/PRs

Fixes #16710

What does this implement/fix? Explain your changes.

This allows algorithms that only implement a fit_transform function, but not transform (such as TSNE) to be used in a Pipeline in sklearn.

Jan Niklas Böhm added 2 commits March 17, 2020 14:05
This makes it possible to plug in algorithms like TSNE in the
pipeline, although they do not support `transform`.  A warnign will be
raised that the pipeline is not reusable.
Copy link
Contributor
@johannfaouzi johannfaouzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It would be nice if you could add some tests in the sklearn/tests/test_pipeline.py file to make sure that the warning and the error are raised in relevant cases.

@jnboehm
Copy link
Contributor Author
jnboehm commented Mar 17, 2020

Thanks! It would be nice if you could add some tests in the sklearn/tests/test_pipeline.py file to make sure that the warning and the error are raised in relevant cases.

Sure! Didn't see that earlier.

Jan Niklas Böhm added 2 commits March 17, 2020 18:40
According to
scikit-learn#16714 (comment)

changed the syntax but not semantics
According to the style guide
Copy link
Contributor
@johannfaouzi johannfaouzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • You have to update the error message in this test.

  • You have to update the error message in this test.

Jan Niklas Böhm added 2 commits March 18, 2020 14:04
Also change the test for FeatureUnion
Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jnboehm
Copy link
Contributor Author
jnboehm commented Apr 28, 2020

I'm a bit unsure why the doc test failed, since I didn't change anything there. Could it be related to #17051?

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging with master may fix your issue.

@jnboehm
Copy link
Contributor Author
jnboehm commented Apr 30, 2020

The CI crashed again, but now everything seems to be working. Let me know if anything needs fixing.

@lucyleeow
Copy link
Member

ping @cmarmo this might be worth a 'waiting for reviewer' tag

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my only concern here is someone training a pipeline with a tsne in the middle, expecting that they'll then be able to predict with it. They wait an hour for the model to fit and then ... bam! No predictions for you. I don't want to waste a user's time like that.

The only way I can see around that is to have a parameter control whether this feature is available, or going further, to introduce a TransductivePipeline.

Pipeline is very much designed to solve needs around train/test setups, so I don't want to weaken its usability for that use-case.

Your thoughts?

Base automatically changed from master to main January 22, 2021 10:52
@cmarmo cmarmo added the Needs Decision Requires decision label Nov 30, 2021
@lorentzenchr
Copy link
Member

Shall we open a dedicated issue to discuss such usage patterns? It also popped up in #22162.

I agree with @jnothman

Pipeline is very much designed to solve needs around train/test setups, so I don't want to weaken its usability for that use-case.

@thomasjpfan
Copy link
Member

Shall we open a dedicated issue to discuss such usage patterns? It also popped up in #22162.

Yes.

@lorentzenchr
Copy link
Member

Persons interested in such a usage pattern should open an issue for discussion (not me).

Also pinging @cathelineparisot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline requires both fit and transform method to be available instead of only fit_transform
7 participants
0