8000 numpy.ma.dump and np.ma.load fail to close the files that they open · Issue #10045 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

numpy.ma.dump and np.ma.load fail to close the files that they open #10045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
r-owen opened this issue Nov 17, 2017 · 16 comments
Closed

numpy.ma.dump and np.ma.load fail to close the files that they open #10045

r-owen opened this issue Nov 17, 2017 · 16 comments
Labels

Comments

@r-owen
Copy link
r-owen commented Nov 17, 2017

numpy 1.13.1 dump fails to close the file that it opens (as a unit test in our code reported). The code appears to be as follows, which is clearly not closing the file it might open:

def dump(a, F):
    if not hasattr(F, 'readline'):
        F = open(F, 'w')
    return pickle.dump(a, F)
@orbit-stabilizer
Copy link
Contributor

Which file is this code located in?

@charris
Copy link
Member
charris commented Nov 18, 2017

numpy/lib/format.py.

numpy/ma/core.py.

@orbit-stabilizer
Copy link
Contributor

Sorry for my ignorance, I've just started contributing, but I'm confused as to whether we can just add F.close() after we have opened the file. It seems like we cannot since we return F after, so we don't want to close it. Am I thinking about this correctly?

@rkern
Copy link
Member
rkern commented Nov 19, 2017

You are thinking about it correctly, but it's easy enough to put in an if test to guard that case. I'd write it like so:

def dump(a, F):
    close_the_file = False
    if not hasattr(F, 'readline'):
        # Assume this is a filename, so open a new file object.
        close_the_file = True
        F = open(F, 'w')
    try:
        pickle.dump(a, F)
    finally:
        if close_the_file:
            # We own the file object, so clean it up explicitly.
            F.close()

All told, it's probably not a big deal as in all Python implementations under which numpy runs (that I am aware of), when the function returns, F goes out of scope, the opened file object will be cleaned up, and the file handle will be closed at that time.

@eric-wieser
Copy link
Member
eric-wieser commented Nov 19, 2017

Why on earth do any of these exist?

  • np.ma.dump
  • np.ma.load
  • np.ma.dumps
  • np.ma.loads

It seems pretty obvious to me that exposing a slightly-tweaked pickle api is not in scope for what ma should do, especially since these tweaks are broken on python 3 anyway (#5491).

Given that this is already broken on py3, I vote we leave it broken and simply add

warnings.warn(
    "np.ma.{method} is deprecated, use pickle.{method} instead".format(method=the_method_name),
    DeprecationWarning)

@njsmith
Copy link
Member
njsmith commented Nov 19, 2017

All told, it's probably not a big deal as in all Python implementations under which numpy runs (that I am aware of), when the function returns, F goes out of scope, the opened file object will be cleaned up, and the file handle will be closed at that time.

On CPython the opened file object is cleaned up immediately, but other implementations like PyPy generally only guarantee that the file object is cleaned up "eventually". (They don't do refcounting, so all deallocations wait until a GC pass happens.) On such implementations, you can have problems if you e.g. call np.dump in a loop, and run out of file descriptors before the GC runs.

It's not exactly a showstopper urgent bug, but it would be better to explicitly close the file here.

@njsmith njsmith changed the title numpy.dump fails to close the file that it opens numpy.ma.dump fails to close the file that it opens Nov 19, 2017
@njsmith
Copy link
Member
njsmith commented Nov 19, 2017

Oh, it's actually np.ma.dump and it's useless? That makes this feel even less urgent :-)

@eric-wieser
Copy link
Member

Yep, np.dump is pickle.dump with no wrapping whatsoever, which is also a little questionable.

@njsmith
Copy link
Member
njsmith commented Nov 19, 2017

Where do you see that? I get:

In [1]: np.dump
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-c704d89a0831> in <module>()
----> 1 np.dump

AttributeError: module 'numpy' has no attribute 'dump'

In [2]: np.__version__
Out[2]: '1.13.3'

@timj
Copy link
timj commented Nov 19, 2017
$ python
Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 18:10:31) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> f = np.array([1,2,3])
>>> f.dump()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: dump() takes exactly 1 argument (0 given)
>>> np.__version__
'1.13.1'
>>> 

The method is in the public API and has not been marked deprecated. On CPython if you use it with warnings enabled you get a ResourceWarning:

ResourceWarning: unclosed file <_io.BufferedWriter name=‘bf_kernel.pkl’>

The fix is trivial (close the file if you opened it) so it would be great if we could add the fix for the next patch release. Shall I make a pull request for this? The example above from @rkern seems fine to me (I don't think pickle.dump() is documented to return anything so the return in the current version of the code doesn't give us anything.

Regarding the comment above about it wrapping pickle with no extra wrapping, it does let you dump to a filename rather than a file handle, which can be handy when you are debugging.

I have no objection to these functions being given a DeprecationWarning for v1.14 if that's where people want to go,although I am guessing there should be a wider discussion than on this issue?

@eric-wieser
Copy link
Member
eric-wieser commented Nov 19, 2017

Where do you see that?

Apologies, I was testing np.loads is pickle.loads, which does return True. You're right, there is no np.dump.

This is super confusing, because np.load and (np.ma.load, np.loads) now refer to different types of serialization.

@eric-wieser
Copy link
Member
eric-wieser commented Nov 19, 2017

it does let you dump to a filename rather than a file handle, which can be handy when you are debugging.

Not in python 3, it doesn't, thanks to #5491. So on that basis, no one is using that featu 8000 re except on python 2.7, which is approaching EOL.

np.save can be used to write directly to a filename too, but is specific to numpy arrays. But it's not numpy.mas job to provide generic serialization helpers!

@orbit-stabilizer
Copy link
Contributor
orbit-stabilizer commented Nov 19, 2017

Ah, I see. Why was pickle.dump(a, F) being returned in the previous case?

Why is this not a good solution?

def dump(a, F): 
    if not hasattr(F, 'readline'):
        with open (F, 'w') as F:
            pickle.dump(a, F)
    else:
        pickle.dump(a, F)

@eric-wieser eric-wieser changed the title numpy.ma.dump fails to close the file that it opens numpy.ma.dump and np.ma.load fail to close the files that they open Nov 19, 2017
@timj
Copy link
timj commented Nov 19, 2017

I'm a bit shocked that it's taken 3 years to not add the b in the open call. @orbit-stabilizer your code does work (with the right wb) but it's a question of whether you duplicate the call to pickle.dump or not. I'm fine with that approach as well.

PS We are using python3.

@eric-wieser
Copy link
Member

@timj: I've o 9E4A f the opinion that if it's been broken since the beginning of time for python 3, and the existance of the function is unjustifiable, we should leave it broken, but with a clearer error message - this is a great time to remove it from the API in anticipation of the Py3 only version of numpy.

@njsmith
Copy link
Member
njsmith commented Nov 19, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants
0