[MRG+2] Model persistence doc #3317

pignacio · 2014-06-26T02:41:03Z

Taking over #3084. Fixes #1332

Comments on #3084 should be addressed in this patch.
I took the liberty and simplified the "Security and maintenance limitations" section as it felt overly verbose for me.

Feel free to comment any improvements/corrections on this

pignacio · 2014-06-26T03:25:29Z

Apparently the travis box ran out of memory while running tests under python 3. Lots of tracebacks ending in

  File "/home/travis/anaconda/envs/testenv/lib/python3.4/multiprocessing/popen_fork.py", line 70, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

The builds for python 2.6/2.7 passed without errors

jnothman · 2014-06-26T03:48:47Z

Travis servers seem to have been relatively unstable lately... sorry i
haven't looked at your changes yet.

On 25 June 2014 23:25, pignacio notifications@github.com wrote:

Apparently the travis box ran out of memory while running tests under
python 3. Lots of tracebacks ending in

File "/home/travis/anaconda/envs/testenv/lib/python3.4/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

The builds for python 2.6/2.7 passed without errors

—
Reply to this email directly or view it on GitHub
#3317 (comment)
.

pignacio · 2014-06-26T07:37:53Z

sorry i haven't looked at your changes yet.

No rush :)

I just got curious when the build failed, and once I found why, i guessed it wouldn't harm posting it here

ogrisel · 2014-06-26T08:19:50Z

Apparently the travis box ran out of memory while running tests under python 3.

Yes tests i 8000 nvolving forks started to fail randomly since a couple of days on travis. They use to be fine for the past year. I hope this is just temporary.

Would be interesting to check wether those failures correlate with the time where travis active builds reach 100% worker capacity: http://status.travis-ci.com/

ogrisel · 2014-06-26T08:28:27Z

doc/modules/model_persistence.rst

+
+* Never unpickle untrusted data
+* Models saved in one version of scikit-learn might not to load in another
+  version.


I would add:

It is strongly advised to save metadata information along with the model pickle files about the training data (such as the reference to an immutable snapshot), the Python source of the code used to train the model, the versions of scikit-learn and its dependencies and finally the cross-validation score obtained on the training data. This should make it possible to rebuild a similar model with future versions of scikit-learn and check that the cross-validation score is still in the same range.

+1 for more detail on bookkeeping with pickles.

@pignacio My phrasing is bad. Maybe rephrase that long conjunction as a 4 items bullet list to improve readability.

pignacio · 2014-06-26T15:37:17Z

@ogrisel there's the metadata paragraph

On the other hand, general question: I'd love to squash the fixup commits into the older commits to get them out of the way. Any ideas if doing that will break in any way the PR? I've never done that before in GH. We can always do the squashing right before the merge, tough.

jnothman · 2014-06-26T20:43:21Z

As long as you're opening a new PR, it shouldn't be a problem. I often
squash before merge anyway.

On 26 June 2014 11:37, pignacio notifications@github.com wrote:

@ogrisel https://github.com/ogrisel there's the metadata paragraph

On the other hand, general question: I'd love to squash the fixup commits
into the older commits to get them out of the way. Any ideas if doing that
will break in any way the PR? I've never done that before in GH. We can
always do the squashing right before the merge, tough.

—
Reply to this email directly or view it on GitHub
#3317 (comment)
.

jnothman · 2014-06-27T01:36:50Z

LGTM!

ogrisel · 2014-06-27T10:30:43Z

+1 for merge.

@jnothman I don't understand your remark about opening a new PR. One could squash the commits and force push to update the existing PR. I don't think this is a problem here as this only about documentation changes and AFAIK nobody is working on a concurrent fork of this branch.

jnothman · 2014-06-27T10:59:15Z

Sorry, I thought @pignacio was talking about doing so on another PR. Yes,
sure, you can squash and force-update, or the person merging it can squash
and just merge it into master without updating the PR.

On 27 June 2014 06:30, Olivier Grisel notifications@github.com wrote:

+1 for merge.

@jnothman https://github.com/jnothman I don't understand your remark
about opening a new PR. One could squash the commit and force push to
update the existing PR. I don't think this is a problem here as this only
about documentation changes and AFAIK nobody is working on a concurrent
fork of this branch.

—
Reply to this email directly or view it on GitHub
#3317 (comment)
.

jnothman · 2014-06-27T11:06:26Z

@pignacio if you would like to do the squashing yourself, I shall leave it to you. Otherwise I'm happy to squash and merge this shortly.

pignacio · 2014-06-27T12:02:13Z

@jnothman There we go. Is that ok?

coveralls · 2014-06-27T12:12:37Z

Coverage remained the same when pulling 4b79f22 on pignacio:model_persistence_doc into afcb384 on scikit-learn:master.

jnothman · 2014-06-27T12:52:27Z

Sure. I would have just squashed all your commits, but it's fine.

DOC Documentation for model persistence

jnothman · 2014-06-27T13:03:42Z

Thanks!

ogrisel · 2014-06-27T13:41:10Z

Thanks @pignacio for finishing up this doc fix.

ogrisel reviewed Jun 26, 2014
View reviewed changes

jnothman changed the title ~~Model persistence doc~~ [MRG+1] Model persistence doc Jun 27, 2014

jnothman changed the title ~~[MRG+1] Model persistence doc~~ [MRG+2] Model persistence doc Jun 27, 2014

Raul Garreta and others added 6 commits June 27, 2014 08:57

added a new section on model persistence

8489c33

model persistence doc, added improvements from ogrisel comments

df27e26

Move model persistence doc inside model selection section

be315ed

Remove trailing whitespace

4c51809

Simplified security and maintenance section

78c0f61

Metadata information for unpickling models in future versions

4b79f22

jnothman added a commit that referenced this pull request Jun 27, 2014

Merge pull request #3317 from pignacio/model_persistence_doc

6460479

DOC Documentation for model persistence

jnothman merged commit 6460479 into scikit-learn:master Jun 27, 2014

This was referenced Jun 27, 2014

[MRG+1] added a new section on model persistence #3084

Closed

model persistence documentation #1332

Closed

pignacio deleted the model_persistence_doc branch June 27, 2014 14:17

jnothman mentioned this pull request Sep 30, 2014

DOC More docs needed on model and data persistence #2801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+2] Model persistence doc #3317

[MRG+2] Model persistence doc #3317

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[MRG+2] Model persistence doc #3317

[MRG+2] Model persistence doc #3317

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!