8000 [MRG] proposal to recommend ".joblib" file extension for load/dump by yufengg · Pull Request #11230 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] proposal to recommend ".joblib" file extension for load/dump #11230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2018

Conversation

yufengg
Copy link
Contributor
@yufengg yufengg commented Jun 10, 2018

I'm proposing the use of filename.joblib instead of filename.pkl for models persisted via the joblib library. This will make it easier for model sharing and reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using the pickle or joblib library.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

I'm proposing the use of `filename.joblib` instead of `filename.pkl` for models persisted via the joblib library. This will make it easier for model sharing and reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using the `pickle` or `joblib` library.
@glemaitre
Copy link
Member

I am not sure about that. Files are also pickled.

@jnothman
Copy link
Member
jnothman commented Jun 11, 2018 via email

@lesteve
Copy link
Member
lesteve commented Jun 13, 2018

is there no incompatibility in loading joblib pickles with pickle.load?

Short answer no.

Longer answer: starting in joblib 0.10 (released in July 2016) files produced by joblib.dump can not be read with pickle.load if the dumped object "contains" a numpy array. If the dumped object does not "contain" a numpy array, then it is a standard Python pickle and can be read with pickle.load

import pickle
import joblib
import numpy as np

filename = '/tmp/test.pkl'
joblib.dump([1, 2, 3], filename)
pickle.load(open(filename, 'rb'))  # works fine
joblib.dump(np.array([1, 2, 3]), filename)
pickle.load(open(filename, 'rb'))  # UnpicklingError: invalid load key, '\x01'.

@lesteve
Copy link
Member
lesteve commented Jun 13, 2018

To sum up, I am fine with changing the extension in the example, maybe .jbl rather than .joblib?

@rth
Copy link
Member
rth commented Jun 13, 2018

is there no incompatibility in loading joblib pickles with pickle.load?

Short answer no.

To summarize the triple negation - it's not strictly compatible :)

maybe .jbl rather than .joblib?

https://datatypes.net/open-jbl-files: .jbl seems to be an already existing extension.

@lesteve
Copy link
Member
lesteve commented Jun 13, 2018

I should have guessed: any combination of 3 letters is already taken with high probability ;-). Let's go for .joblib then!

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @yufengg

@qinhanmin2014
Copy link
Member

Please also update other similar places in the repo (e.g., doc/tutorial/basic/tutorial.rst)

@ogrisel
Copy link
Member
ogrisel commented Jun 18, 2018

LGTM as well. I did the change in the master branch of joblib. Let's merge this.

@ogrisel ogrisel merged commit 355d880 into scikit-learn:master Jun 18, 2018
@lesteve
Copy link
Member
lesteve commented Jun 18, 2018

I pushed a similar change in 066b501 for doc/tutorial/basic/tutorial.rst.

georgipeev pushed a commit to georgipeev/scikit-learn that referenced this pull request Jun 20, 2018
Recommend to use of `filename.joblib` instead of `filename.pkl` for models persisted via the joblib library to reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using the `pickle` or `joblib` library.
@yufengg
Copy link
Contributor Author
yufengg commented Aug 22, 2018

Sorry I'm not familiar with the doc site publishing workflow -- is there any additional actions for me to take to reflect the changes on the website? It is still showing the pre-merge content.

http://scikit-learn.org/stable/modules/model_persistence.html

@jnothman
Copy link
Member

Look at /dev rather than /stable. We will release real soon now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0