8000 TransforemdTargetRegressor fit calls to transformer.fit twice · Issue #11618 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

TransforemdTargetRegressor fit calls to transformer.fit twice #11618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vahidbas opened this issue Jul 18, 2018 · 6 comments · Fixed by #11641
Closed

TransforemdTargetRegressor fit calls to transformer.fit twice #11618

vahidbas opened this issue Jul 18, 2018 · 6 comments · Fixed by #11641

Comments

@vahidbas
Copy link

Description

Once transformer fit is done here and again here during fit_transform. I don't think this is intended as fitting can be costly and there is no reason to repeat it for the same data.

Steps/Code to Reproduce

from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.compose import TransformedTargetRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn import datasets

class DummyTransformer(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass
    
    def fit(self, X):
        print('in fit')
        return self
    
    def transform(self, X):
        print('in transform')
        return X
    
    def inverse_transform(self, X):
        print('in inverse_transform')
        return X
    
    
estimator = TransformedTargetRegressor(
    regressor=DecisionTreeRegressor(),
    transformer=DummyTransformer(),
    check_inverse = False
)

X, y = datasets.load_linnerud(return_X_y=True)

estimator.fit(X, y)

returns:

in fit
in fit
in transform

Expected Results

in fit
in transform
@jnothman
Copy link
Member

This is intentionally used to check the inversion as a form of input validation. It can be disabled with a parameter. In what context is this severely problematic?

@vahidbas
Copy link
Author

@jnothman

This is intentionally used to check the inversion as a form of input validation. It can be disabled with a parameter

then shouldn't self.transformer_.fit(y) at line 133 be inside the if statement if self.check_inverse: at line 134?
in my example in the description check_inverse is False but still the fit is called twice.

In what context is this severely problematic?

Let say I have a transformer with heavy fit (e.g. sparse dictionary learning) and I am doing grid search cross-validation for hyper parameter tuning which includes transformer parameter as well. I wouldn't want to fit my transformer twice every time.

@jnothman jnothman reopened this Jul 19, 2018
@jnothman
Copy link
Member

No, the check_inverse logic should be in FunctionTransformer, but I agree that if your snippet is right, this seems to be a bug

@jnothman
Copy link
Member

No maybe it's not about FunctionTransformer. I'm not sitting down now to try replicate/debug

@glemaitre
Copy link
Member

The fitting should be done in the check_inverse condition.
I opened a PR.

@glemaitre
Copy link
Member

@vahidbas Thanks for reporting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0