8000 Pipeline requires both fit and transform method to be available instead of only fit_transform · Issue #16710 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Pipeline requires both fit and transform method to be available instead of only fit_transform #16710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jnboehm opened this issue Mar 16, 2020 · 11 comments · May be fixed by #16714
Open

Pipeline requires both fit and transform method to be available instead of only fit_transform #16710

jnboehm opened this issue Mar 16, 2020 · 11 comments · May be fixed by #16714
Labels
Bug Low Priority Low priority issues and pull requests module:pipeline

Comments

@jnboehm
Copy link
Contributor
jnboehm commented Mar 16, 2020

Describe the bug

Calling a pipeline with a nonparametric function causes an error since the function transform() is missing. The pipeline itself calls the function fit_transform() if it's present. For nonparametric functions (the most prominent being t-SNE) a regular transform() method does not exist since there is no projection or mapping that is learned. It could still be used for dimensionality reduction.

Steps/Code to Reproduce

Example:

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.pipeline import make_pipeline

make_pipeline(TSNE(), PCA())

Expected Results

A pipeline is created.

Actual Results

Output:

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'TSNE(angle=0.5,...

Possible Solution

Editing this https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/pipeline.py#L179
to the following in order to reflect the change:

if (not hasattr(t, "fit_transform")) or not (hasattr(t, "fit") and hasattr(t, "transform")):

Versions

import sklearn; sklearn.show_versions()
System:
    python: 3.8.2 (default, Feb 26 2020, 22:21:03)  [GCC 9.2.1 20200130]
executable: /usr/bin/python3
   machine: Linux-5.5.9-arch1-2-x86_64-with-glibc2.2.5

Python dependencies:
       pip: 20.0.2
setuptools: 46.0.0
   sklearn: 0.22.2.post1
     numpy: 1.18.1
     scipy: 1.4.1
    Cython: 0.29.15
    pandas: 1.0.1
matplotlib: 3.2.0
    joblib: 0.14.1

Built with OpenMP: True
@jnothman
Copy link
Member

Yes, tsne, at least in our conception and implementation of it, does not induce a model that can be applied to a test set..

@jnothman
Copy link
Member

Tsne is transductive only

@jnboehm
Copy link
Contributor Author
jnboehm commented Mar 16, 2020

That's a fair point, I was trying to use it for a single pass pipeline, essentially. I see how the lack of an internal model could become an issue once test data is introduced. Anyways, thanks for the explanation.

@jnothman jnothman reopened this Mar 17, 2020
@jnothman
Copy link
Member

Oh, I didn't note that you made no attempt to run any methods on the pipeline. I agree that it should be possible to construct the pipeline and run fit_transform on it, even if it is a somewhat degenerate pipeline. Thanks for reporting!

@jnothman jnothman added Bug help wanted Low Priority Low priority issues and pull requests Easy Well-defined and straightforward way to resolve and removed Bug: triage labels Mar 17, 2020
@jnboehm
Copy link
Contributor Author
jnboehm commented Mar 17, 2020

In that case, what should the behavior be? Should it check the transform property of the transformer and raise a TypeError before trying to call the function here? Or should it warn at construction time that only fit_transform is available (or that this pipeline can only be called once)?

@johannfaouzi
Copy link
Contributor
johannfaouzi commented Mar 17, 2020

I would go for the latter. It's a degenerate pipeline but users would expect it to work.

It would require to change https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/pipeline.py#L179-L180 so that, if an intermediate step has only fit_transform, the available methods for the pipeline are:

  • fit (as always),
  • fit_transform if the last step is a transformer (i.e. has a transform method),
  • fit_predict if the last step has a fit_predict method.

An error should probably be raised if the last step is not compatible (i.e. is a predictor without a fit_predict method).

Edit: TSNE could be the last estimator of the pipeline, so the check should be run for all the steps. It might a good idea to perform a check on all the estimators first, to see if there is an estimator with limitations (only fit_transform for any transformer), then perform a if else loop for both cases (we could keep the current code for one case, we raise a warning for the other case).

@jnothman
Copy link
Member
jnothman commented Mar 17, 2020 via email

@jnboehm
Copy link
Contributor Author
jnboehm commented Mar 17, 2020

I personally would just opt to warn in case an intermediate step only supports fit_transform.

Is my assumption correct that the @if_delegate_has_method would take care of raising errors already? Then simply warning during construction and letting it crash if the user specifies an unsupported operation would in my opinion be sufficient. Or should more cases be handled after the warning?

Since fit_transform will be preferred over transform I'm not sure whether the code needs to be changed for that at all.

Also, in what way do validate_steps and validate_transformers differ? This is 8000 a bit unclear to me. For now I'll change both functions.

@jnothman
Copy link
Member
jnothman commented Mar 17, 2020 via email

@nachidave
Copy link

Hi,
I am a beginner I also want to to contribute to open source so can you guys give me a suggestion please from where to start

@raisa1921
Copy link

p-1
p-2
I'm having this error, can you please tell me how to solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0