8000 BUG?: 1.14 printing requires gc run to clean up all references to printed array · Issue #10620 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG?: 1.14 printing requires gc run to clean up all references to printed array #10620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scoder opened this issue Feb 17, 2018 · 23 comments
Closed

Comments

@scoder
Copy link
Contributor
scoder commented Feb 17, 2018

Dear NumPy devs, one of our NumPy integration doctests in Cython has been failing on travis for a couple of weeks. I checked the changelog of CPython and Cython from the time when it started, but neither showed anything remotely related. However, the failing travis nodes use NumPy 1.14, and since there were PyPy related changes in that release, I suspect that this could be the cause. I only see the failures in Py3.5+, and they started in the CPython-dev integration test runs for py3.6-dev and py3.7-dev on travis, however those are built (NumPy is pre-installed in them). Here's an example of a failing run.

The now failing test is simply this:

    cdef int[:, :] array = create_array((4, 5), mode="c", use_callback=True)
    print(np.asarray(array)[::2, ::2])

Meaning, there is a buffer provider (not NumPy), we take a Cython memory view from it, and then wrap that with numpy.asarray() to print it. The difference is when the buffer provider will be deallocated. Previously, it was cleaned up immediately at function exit (and we print a confirmation in a cleanup callback when that happens). With the new NumPy version, it is no longer cleaned up immediately, but only at the next GC run. I added a conditional GC call to the test to trigger that and keep the test from failing (we didn't need a GC run before), but I thought I'd at least bring this change to your attention, in case it's unintentional.

@seberg
Copy link
Member
seberg commented Feb 17, 2018

Sounds like a bug if it is numpy's fault, since there should be no reason to require a gc run, but I don't see why this should happen (not sure about all pypy changes, but those that I remember should be pretty unrelated).
There is a view, an array and the memoryview. The view should reference the array (or maybe the memoryview, but don't think it does here) and the array also the memoryview (stored in array.base).

If CPython has something to do with it, it would have to be something about changes in the buffer/memoryview interface. I expect the same goes for NumPy here.

Or.... there is something going on about the new printing functionality causing some more complex referencing. That seems strange, but I think it might be a real thing to look into. The print itself is required? I.e. if you only print a single item from the array or so it is fine?

@seberg
Copy link
Member
seberg commented Feb 17, 2018

If it is the printing, not sure it is a real bug, but could still be nice to remove at least if it is not hard.

@seberg
Copy link
Member
seberg commented Feb 17, 2018

Yeah, it is the printing. Printing temporarily increases the reference count by one.

@seberg seberg changed the title Different GC behaviour in 1.14 1.14 printing requires gc run to clean up all references to printed array Feb 17, 2018
@seberg seberg changed the title 1.14 printing requires gc run to clean up all references to printed array BUG?: 1.14 printing requires gc run to clean up all references to printed array Feb 17, 2018
@seberg
Copy link
Member
seberg commented Feb 17, 2018

So yeah, I guess its not really a bug, but still might be nice to get rid of if it is easy.

@eric-wieser
Copy link
Member

My guess would be it's the recursion detection, but it's not clear to my why that would hold onto a reference.

@eric-wieser
Copy link
Member

Printing temporarily increases the reference count by one.

Can you give a repro? I can't seem to make this happen on master

@seberg
Copy link
Member
seberg commented Feb 17, 2018

sorry:

#!/usr/bin/env python

import numpy as np
import sys
import gc

gc.disable()

arr = np.arange(2000).reshape(5, 2, -1)

view = arr[::2, ::2]

print(sys.getrefcount(view))
print(view)
print(view)
print(sys.getrefcount(view))
gc.collect()
print(sys.getrefcount(view))

prints out 2 and then 4. Was using 1.14.0 on py 3.6.4, so there is some chance that it got fixed in master, didn't really excpect much changes since 1.14.0....

@seberg
Copy link
Member
seberg commented Feb 17, 2018

4 and then 2 of course ;).

@charris
Copy link
Member
charris commented Feb 17, 2018

Yeah, still in master.

@ahaldane
Copy link
Member
ahaldane commented Feb 17, 2018

It has something to do with the new use of a recursive closure in _formatArray.

Here's the code structure which triggers it, where func plays the role of _formatArray, and closure plays the role of recurse in arrayprint.py :

import sys, gc

def func(a):
    def closure(i):
        if i == 0:
            return a
        return closure(i-1)
    return "val"

gc.disable()
a = np.arange(2)
print(sys.getrefcount(a))
r = func(a)
r = func(a)
print(sys.getrefcount(a))
gc.collect()
print(sys.getrefcount(a))

Interestingly, it only happens if the closure is recursive, and the closure does not even need to be called.

@ahaldane
Copy link
Member

Yeah, I guess this is the expected behavior:

https://bugs.python.org/issue4921

So we just have to reorganize this code to avoid a recursive closure.

@seberg
Copy link
Member
seberg commented Feb 17, 2018

Can't you just delete the closure function when done with it to solve the cycle manually?

@ahaldane
Copy link
Member

Oh, that's much simpler. I'll submit a PR with just that.

@ahaldane
Copy link
Member

After a quick grep I found other recursive closures in numpy: in np.block,and a couple in np.loadtxt. Those examples don't seem too harmful since they only use a few python scalars as closure variables.

@ahaldane
Copy link
Member
ahaldane commented Feb 17, 2018

Hah, also, python2 does not allow me to delete the closure function:

[1]: def func(a):
...:     def closure(i):
...:         if i == 0:
...:             return a
...:         return closure(i-1)
...:     del closure
...:     return "val"
SyntaxError: can not delete variable 'closure' referenced in nested scope

Python3 allows it though.

ahaldane added a commit to ahaldane/numpy that referenced this issue Feb 17, 2018
@eric-wieser
Copy link
Member
eric-wieser commented Feb 18, 2018

Does setting closure = None work, if del closure doesn't? That should be enough to break the reference cycle

@scoder
Copy link
Contributor Author
scoder commented Feb 18, 2018

Or a = None. Seems more explicit to reset the reference to the array than to try to delete some inner function.

@eric-wieser
Copy link
Member
eric-wieser commented Feb 18, 2018

leaving the inner function still leaves a reference cycle though, which we still want to clean up if possible. Setting a = None is addressing the symptom, not the cause.

@ahaldane
Copy link
Member

Both a = None and closure = None seem to work. I agree only closure = None seems to really remove the cycle, though.

I don't very much like adding these extra = None lines though, they seem a little too magical. Maybe we should go with @eric-wieser's idea of a callable class which has the recursive part as a member function, or just leave it as a separate helper function like it is in #10621 right now, and avoid the recursive closures altogether.

Actually, recursive closures seem like something PyPy could be made to account for more nicely. PyPy probably already implements its own version of closures and the co_cellvars table, so it seems there is opportunity to detect and remove the cycle when the closure goes out of scope.... maybe it gets complicated though.

@ahaldane
Copy link
Member

Oh, I guess the problem with my last paragraph is the case where you return the (recursive) closure function. I.e it survives past the scope of the conaining function.

@eric-wieser
Copy link
Member

they seem a little too magical

I think adding # break the reference cycle - bpo-XXXXX explains most of the magic, and I think that might be preferable to passing parameters around in an opaque tuple as you do in #10621.

I went for a recursive closure in the first place because it was less verbose than defining a helper class, and seems like a reasonably useful pattern in general for a function with a recursive implementation that needs to do some initial work.

Even if we fix _formatArray with your proposed fix, we should break the reference cycles manually in the other places we use this pattern.

@ahaldane
Copy link
Member

True, perhaps the = None trick is the more elegant one in the end.

I'm still searching for alternate solutions. This is an interesting thread I'm still reading through, which shows a couple other cute solutions: https://mail.python.org/pipermail/python-dev/2009-December/094439.html

(also shows we may want to put the =None in a finally block)

@ahaldane
Copy link
Member

So after reading that thread, I think the best is to set the closure to None in a finally block. I've updated #10621 accordingly, fixing the 4 recursive closures I found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
0