8000 ENH Adds InconsistentVersionWarning when unpickling version doesn't match by thomasjpfan · Pull Request #25297 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Adds InconsistentVersionWarning when unpickling version doesn't match #25297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 12, 2023

Conversation

thomasjpfan
Copy link
Member
@thomasjpfan thomasjpfan commented Jan 5, 2023

Reference Issues/PRs

Closes #25273
Alternative to #25280

What does this implement/fix? Explain your changes.

This PR adds more information to the warning, so a user can record the warning and get information about the version used to pickle the estimator:

with warnings.catch_warnings(record=True) as w:
    est = pickle.loads(...)
    
warning = w[0].message
print(warning.original_sklearn_version)

One can also turn the warning into an error and get the original_sklearn_version from there:

warnings.simplefilter("error", InconsistentVersionWarning)

try:
    est = pickle.loads(...)
except InconsistentVersionWarning as w:
    print(w.original_sklearn_version)

Any other comments?

This PR does not add any more state of the estimator itself.

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I like the PR.

@glemaitre
Copy link
Member

I am split here. The change is simple. However, the solution is a bit in between parsing the warning message and just having a nice attribute.

Since the other solution involves adding a __new__ that is scary, the current solution might be enough. We should only document it in the pickling section to provide a code snippet that just works to get the pickling version.

@betatim
Copy link
Member
betatim commented Jan 5, 2023

I like it. Same question as Adrin about the parsing vs separate arguments.

A curiosity: when I load an estimator pickled in an older version of sklearn in ipython I see the same warning twice. Does someone know if this is a "weird thing" related to ipython or normal behaviour or something totally different?

A separate thing (so we can punt it to a new PR if we want to) that I find confusing about this is that a estimator pickled with v1 and then unpickled with this PR (or any future version) will return {..., "_sklearn_version": '1.3.dev0'}. This was also mentioned in the original issue (#22055). It seems weird and easy to fix. It happens because __getstate__ unconditionally adds _sklearn_version to the state dict (return dict(state.items(), _sklearn_version=__version__)), instead I think it should check if the key is present and only set it if it isn't present.

@adrinjalali
Copy link
Member

instead I think it should check if the key is present and only set it if it isn't present.

@betatim the key is never present (and I think it's a good thing) since __setstate__ pops it before setting the attributes. If the user saves a model which was loaded from an older sklearn version, we probably don't care about the usecase, do we? I'm happy with this solution.

@glemaitre
Copy link
Member

Shall we document the way to programmatically recover the pickle version in the persistence documentation?

@betatim
Copy link
Member
betatim commented Jan 6, 2023

Shall we document the way to programmatically recover the pickle version in the persistence documentation?

Sounds like a good idea. If there is a documented way to get the version used to fit an estimator that reduces the number of people calling __getstate__ to find out.

I hadn't realised that __setstate__ pops the variable, and I did quickly read the code ... . Combined with the fact that it doesn't contain what people are looking for when they poke around, having a documented way of getting the right answer is good.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, LGTM.

thomasjpfan and others added 2 commits January 11, 2023 14:06
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre merged commit 70c690e into scikit-learn:main Jan 12, 2023
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Jan 12, 2023
…atch (scikit-learn#25297)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__sklearn_pickle_version__ makes estimator.__dict__.keys() == loaded.__dict__.keys() to fail
4 participants
0