-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Model persistence doc #3317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Apparently the travis box ran out of memory while running tests under python 3. Lots of tracebacks ending in
The builds for python 2.6/2.7 passed without errors |
Travis servers seem to have been relatively unstable lately... sorry i On 25 June 2014 23:25, pignacio notifications@github.com wrote:
|
No rush :) I just got curious when the build failed, and once I found why, i guessed it wouldn't harm posting it here |
Yes tests i 8000 nvolving forks started to fail randomly since a couple of days on travis. They use to be fine for the past year. I hope this is just temporary. Would be interesting to check wether those failures correlate with the time where travis active builds reach 100% worker capacity: http://status.travis-ci.com/ |
|
||
* Never unpickle untrusted data | ||
* Models saved in one version of scikit-learn might not to load in another | ||
version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add:
It is strongly advised to save metadata information along with the model pickle files about the training data (such as the reference to an immutable snapshot), the Python source of the code used to train the model, the versions of scikit-learn and its dependencies and finally the cross-validation score obtained on the training data. This should make it possible to rebuild a similar model with future versions of scikit-learn and check that the cross-validation score is still in the same range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for more detail on bookkeeping with pickles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pignacio My phrasing is bad. Maybe rephrase that long conjunction as a 4 items bullet list to improve readability.
@ogrisel there's the metadata paragraph On the other hand, general question: I'd love to squash the fixup commits into the older commits to get them out of the way. Any ideas if doing that will break in any way the PR? I've never done that before in GH. We can always do the squashing right before the merge, tough. |
As long as you're opening a new PR, it shouldn't be a problem. I often On 26 June 2014 11:37, pignacio notifications@github.com wrote:
|
LGTM! |
+1 for merge. @jnothman I don't understand your remark about opening a new PR. One could squash the commits and force push to update the existing PR. I don't think this is a problem here as this only about documentation changes and AFAIK nobody is working on a concurrent fork of this branch. |
Sorry, I thought @pignacio was talking about doing so on another PR. Yes, On 27 June 2014 06:30, Olivier Grisel notifications@github.com wrote:
|
@pignacio if you would like to do the squashing yourself, I shall leave it to you. Otherwise I'm happy to squash and merge this shortly. |
@jnothman There we go. Is that ok? |
Sure. I would have just squashed all your commits, but it's fine. |
DOC Documentation for model persistence
Thanks! |
Thanks @pignacio for finishing up this doc fix. |
Taking over #3084. Fixes #1332
Comments on #3084 should be addressed in this patch.
I took the liberty and simplified the "Security and maintenance limitations" section as it felt overly verbose for me.
Feel free to comment any improvements/corrections on this