-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] proposal to recommend ".joblib" file extension for load/dump #11230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm proposing the use of `filename.joblib` instead of `filename.pkl` for models persisted via the joblib library. This will make it easier for model sharing and reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using the `pickle` or `joblib` library.
I am not sure about that. Files are also pickled. |
Perhaps you're right: is there no incompatibility in loading joblib pickles
with pickle.load?
|
Short answer no. Longer answer: starting in joblib 0.10 (released in July 2016) files produced by import pickle
import joblib
import numpy as np
filename = '/tmp/test.pkl'
joblib.dump([1, 2, 3], filename)
pickle.load(open(filename, 'rb')) # works fine
joblib.dump(np.array([1, 2, 3]), filename)
pickle.load(open(filename, 'rb')) # UnpicklingError: invalid load key, '\x01'. |
To sum up, I am fine with changing the extension in the example, maybe |
To summarize the triple negation - it's not strictly compatible :)
https://datatypes.net/open-jbl-files: .jbl seems to be an already existing extension. |
I should have guessed: any combination of 3 letters is already taken with high probability ;-). Let's go for .joblib then! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @yufengg
Please also update other similar places in the repo (e.g., doc/tutorial/basic/tutorial.rst) |
LGTM as well. I did the change in the master branch of joblib. Let's merge this. |
I pushed a similar change in 066b501 for |
Recommend to use of `filename.joblib` instead of `filename.pkl` for models persisted via the joblib library to reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using the `pickle` or `joblib` library.
Sorry I'm not familiar with the doc site publishing workflow -- is there any additional actions for me to take to reflect the changes on the website? It is still showing the pre-merge content. http://scikit-learn.org/stable/modules/model_persistence.html |
Look at /dev rather than /stable. We will release real soon now... |
I'm proposing the use of
filename.joblib
instead offilename.pkl
for models persisted via the joblib library. This will make it easier for model sharing and reduce confusion when it comes to time load a model, as it will be more clear whether a file was saved using thepickle
orjoblib
library.Reference Issues/PRs
What does this implement/fix? Explain your changes.
Any other comments?