-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG?: 1.14 printing requires gc run to clean up all references to printed array #10620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sounds like a bug if it is numpy's fault, since there should be no reason to require a gc run, but I don't see why this should happen (not sure about all pypy changes, but those that I remember should be pretty unrelated). If CPython has something to do with it, it would have to be something about changes in the buffer/memoryview interface. I expect the same goes for NumPy here. Or.... there is something going on about the new printing functionality causing some more complex referencing. That seems strange, but I think it might be a real thing to look into. The print itself is required? I.e. if you only print a single item from the array or so it is fine? |
If it is the printing, not sure it is a real bug, but could still be nice to remove at least if it is not hard. |
Yeah, it is the printing. Printing temporarily increases the reference count by one. |
So yeah, I guess its not really a bug, but still might be nice to get rid of if it is easy. |
My guess would be it's the recursion detection, but it's not clear to my why that would hold onto a reference. |
Can you give a repro? I can't seem to make this happen on master |
sorry:
prints out 2 and then 4. Was using 1.14.0 on py 3.6.4, so there is some chance that it got fixed in master, didn't really excpect much changes since 1.14.0.... |
4 and then 2 of course ;). |
Yeah, still in master. |
It has something to do with the new use of a recursive closure in Here's the code structure which triggers it, where import sys, gc
def func(a):
def closure(i):
if i == 0:
return a
return closure(i-1)
return "val"
gc.disable()
a = np.arange(2)
print(sys.getrefcount(a))
r = func(a)
r = func(a)
print(sys.getrefcount(a))
gc.collect()
print(sys.getrefcount(a)) Interestingly, it only happens if the closure is recursive, and the closure does not even need to be called. |
Yeah, I guess this is the expected behavior: https://bugs.python.org/issue4921 So we just have to reorganize this code to avoid a recursive closure. |
Can't you just delete the closure function when done with it to solve the cycle manually? |
Oh, that's much simpler. I'll submit a PR with just that. |
After a quick grep I found other recursive closures in numpy: in np.block,and a couple in np.loadtxt. Those examples don't seem too harmful since they only use a few python scalars as closure variables. |
Hah, also, python2 does not allow me to delete the closure function: [1]: def func(a):
...: def closure(i):
...: if i == 0:
...: return a
...: return closure(i-1)
...: del closure
...: return "val"
SyntaxError: can not delete variable 'closure' referenced in nested scope Python3 allows it though. |
Does setting |
Or |
leaving the inner function still leaves a reference cycle though, which we still want to clean up if possible. Setting |
Both I don't very much like adding these extra Actually, recursive closures seem like something PyPy could be made to account for more nicely. PyPy probably already implements its own version of closures and the co_cellvars table, so it seems there is opportunity to detect and remove the cycle when the closure goes out of scope.... maybe it gets complicated though. |
Oh, I guess the problem with my last paragraph is the case where you return the (recursive) closure function. I.e it survives past the scope of the conaining function. |
I think adding I went for a recursive closure in the first place because it was less verbose than defining a helper class, and seems like a reasonably useful pattern in general for a function with a recursive implementation that needs to do some initial work. Even if we fix |
True, perhaps the I'm still searching for alternate solutions. This is an interesting thread I'm still reading through, which shows a couple other cute solutions: https://mail.python.org/pipermail/python-dev/2009-December/094439.html (also shows we may want to put the |
So after reading that thread, I think the best is to set the closure to |
Dear NumPy devs, one of our NumPy integration doctests in Cython has been failing on travis for a couple of weeks. I checked the changelog of CPython and Cython from the time when it started, but neither showed anything remotely related. However, the failing travis nodes use NumPy 1.14, and since there were PyPy related changes in that release, I suspect that this could be the cause. I only see the failures in Py3.5+, and they started in the CPython-dev integration test runs for py3.6-dev and py3.7-dev on travis, however those are built (NumPy is pre-installed in them). Here's an example of a failing run.
The now failing test is simply this:
Meaning, there is a buffer provider (not NumPy), we take a Cython memory view from it, and then wrap that with
numpy.asarray()
to print it. The difference is when the buffer provider will be deallocated. Previously, it was cleaned up immediately at function exit (and we print a confirmation in a cleanup callback when that happens). With the new NumPy version, it is no longer cleaned up immediately, but only at the next GC run. I added a conditional GC call to the test to trigger that and keep the test from failing (we didn't need a GC run before), but I thought I'd at least bring this change to your attention, in case it's unintentional.The text was updated successfully, but these errors were encountered: