8000 False alarm NotFittedError in transform for legacy OneHotEncoder · Issue #12680 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

False alarm NotFittedError in transform for legacy OneHotEncoder #12680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Nov 26, 2018 · 6 comments
Closed

False alarm NotFittedError in transform for legacy OneHotEncoder #12680

jnothman opened this issue Nov 26, 2018 · 6 comments
Labels
Milestone

Comments

@jnothman
Copy link
Member

Reported by @arnau126 at #12443 (comment):

This change breaks transform in _legacy_mode. It raises NotFittedError when it's actually fitted. This is because check_is_fitted checks categories_ when _legacy_transform doesn't use it at all.

In _legacy_mode we should check _feature_indices_, _n_values_ and _active_features_ instead.

I propose:

def transform(self, X):
   if self._legacy_mode:
       check_is_fitted(self, ('_feature_indices_', '_n_values_', '_active_features_'))
       return _transform_selected(X, self._legacy_transform, self.dtype,
                                                   self._categorical_features, copy=True)
   else:
       check_is_fitted(self, 'categories_')
       return self._transform_new(X)

I'm surprised we don't already have tests for this case. @arnau126, please submit a minimal verifiable example, or just a test case by which we can confirm the error and your fix. Or submit a pull request with your fix and a test.

@jnothman jnothman added the Bug label Nov 26, 2018
@jnothman jnothman added this to the 0.20.2 milestone Nov 26, 2018
@arnau126
Copy link
arnau126 commented Nov 27, 2018

Months ago I fitted a OneHotEncoder in scikit-learn==0.19.1 and I stored the object in a pickle.
Last week I updated scikit-learn to 0.20.0, and I wanted to keep using the pickled object.
I cannot refit it because I don't have the original data.

So I manually adapted the object by doing:

ohe._feature_indices_ = ohe.feature_indices_
ohe._active_features_ = ohe.active_features_
ohe._n_values_ = ohe.n_values_
ohe._n_values = ohe.n_values
ohe._legacy_mode = True

(I didn't set categories_ attribute because I saw that _legacy_transform didn't use it)

This patch worked and passed all my internal tests.

But now, I've updated to 0.20.1 and a NotFittedError is raised every time I call transform because of categories_ which is actually not necessary.

I don't know if my particular case worth a PR. If does, I'll be willing to submit it.

@jnothman
Copy link
Member Author

So you're saying this applies to an encoder unpickled from 0.19? We don't claim to support that, unfortunately...

@arnau126
Copy link

Okay. Thanks anyway.

@amueller
Copy link
Member

So close? I'm pretty sure legacy_mode is tested so I don't see how this could happen (not saying it couldn't).

@arnau126
Copy link
arnau126 commented Nov 27, 2018

This happens when in 0.20.1 you unpickle a OneHotEncoder from 0.19.1.

The unpickled encoder is fitted, but a NotFittedError is raised because of the check_is_fitted(self, 'categories_'). If I remove this check, the encoder works correctly as legacy_mode doesn't use categories_.

Hope my explanation is understood (let me know if not) . However I understand that you don't claim to support that.

@amueller
Copy link
Member

Yeah, sorry, I don't think we want to try to support that as it opens a giant can of worms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
0