8000 Allow the creation of a pipeline for non-parametric functions like TSNE by cathelineparisot · Pull Request #22162 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Allow the creation of a pipeline for non-parametric functions like TSNE #22162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

cathelineparisot
Copy link

Reference Issues/PRs

Fixes #16710. See also #16714.

What does this implement/fix? Explain your changes.

Calling a pipeline for a non-parametric function like TSNE() returns an error because the transform() function is missing. However, for non-parametric functions, there is no transform() method because there is no projection or mapping, yet we can use dimensionality reduction.

So, we must be able to create a pipeline even if the transform() method does not exist. This PR allows it

Any other comments?

However, I have the impression that the constructor of the current library no longer calls "validate_step()" as I added it in the "init". What is the problem? It was this function that was creating the error and I have modified this function. I am waiting for your feedback.

@cathelineparisot cathelineparisot changed the title [fix]allow-when-only-fit-transform #16710[fix]allow-when-only-fit-transform Jan 10, 2022
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @cathelineparisot ! I think we still need an answer to #16714 (review) before we can continue. I currently agree with the comment:

Pipeline is very much designed to solve needs around train/test setups, so I don't want to weaken its usability for that use-case.

A user can spend a long time fitting a pipeline with a TNSE in the middle which works because TSNE has fit_transform. pipe.predict would fail because a step in the pipeline does not define transform. This is a poor user experience because they can spend a long time time in fitting but only finds out at the end that 8000 they can not use it for prediction.

@cathelineparisot
Copy link
Author

Thank you for the PR @cathelineparisot ! I think we still need an answer to #16714 (review) before we can continue. I currently agree with the comment:

Pipeline is very much designed to solve needs around train/test setups, so I don't want to weaken its usability for that use-case.

A user can spend a long time fitting a pipeline with a TNSE in the middle which works because TSNE has fit_transform. pipe.predict would fail because a step in the pipeline does not define transform. This is a poor user experience because they can spend a long time time in fitting but only finds out at the end that they can not use it for prediction.

Thank you for your feedback. I understand what you mean. In this case, setting a warning might already be a solution (as it was done in #16714). It is also necessary to see if creating a Pipeline with a TNSE is useful in other cases or only to make a prediction. I don't have the knowledge for that.

@thomasjpfan
Copy link
Member

Please update the title to better describe what the PR does. (Same comment for your other #22165 PR)

@cathelineparisot cathelineparisot changed the title #16710[fix]allow-when-only-fit-transform Allow the creation of a pipeline for non-parametric functions like TSNE Jan 12, 2022
@cathelineparisot
Copy link
Author

Please update the title to better describe what the PR does. (Same comment for your other #22165 PR)

I changed the title of my two pull requests. I hope it suits you. I apologize for the bad titles.

@thomasjpfan
Copy link
Member

I am closing this PR in favor of #16714 since that PR contains tests and has more discussion around the topic. Thank you for looking into this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline requires both fit and transform method to be available instead of only fit_transform
2 participants
0