-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Fix incorrect/missing reference cleanups found using valgrind #12624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/* | ||
* Note that the buffer info is cached as pyints making them appear like | ||
* unreachable lost memory to valgrind. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, right now I do not understand the logic of the cache, but I did not try to either and I guess they are very small objects.
c2873d9
to
0e753dc
Compare
I guess some of these could get tests... |
@@ -3255,6 +3255,7 @@ array_datetime_data(PyObject *NPY_UNUSED(dummy), PyObject *args) | |||
} | |||
|
|||
meta = get_datetime_metadata_from_dtype(dtype); | |||
Py_DECREF(dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably one of the few ones that could hurt in practice, if someone prints a lot of datetime objects for some reason.
I wonder if there is a nice way to hook pytest to run some special code for each test or simply more specific tests (after maybe just listing them all). To be able to do these runs without a manual bisect of the tests in the file. |
Hmmm, the following code leaks a dtype:
Where the EDIT: OK found the issue. |
1d82476
to
eae11d4
Compare
@seberg Would you recommend backporting this at some point? |
Yeah, I would say so. Most of them are small or very small fish though, so if we want to keep things minimal we could pick out the bigger ones maybe. |
Frankly, a lot of these are even irrelevant except to keep the output clean, such as errors in the tests. |
The bug in the memory overlap tests worries me the most right now. Because it seems to me like we may be losing whole arrays due to memory overlap checks sometimes. EDIT: We were, for reductions with overlap. |
Ah, a few of the odder/bigger errors are due to circular references that are not manually resolved (and numpy's lack of support). |
So, a few of the leaks could in principle be non-harmless, though I guess rarely. Since I do not remember all fixes that were found while searching for other things, we should use the code coverage to judge if new tests are needed anywhere or maybe if not to backport something. |
Py_DECREF(args); | ||
if (item == NULL) { | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code above does not seem super pretty, but I am not quite sure that subclasses are not supposed to get a chance here, so PyObject_GetItem
is no behaviour change I think. Oddly the datetime
code looks a bit different than the timedelta
one above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - I only looked in detail at umath
, but those certainly are good (apologies for the one I introduced...).
Things are still running, but the current status is probably:
EDIT: add last point. |
That last one (not counting f2py and probably python related ones) is seriously strange, because I can't seem to bisect the test that fails. It is as if there is some crosstalk between multiple tests. Hmmm, lets poke it a bit more, but I may just give up. |
I have cooked down import numpy as np
from numpy.ma.core import masked_array
def test_fancy_printoptions():
t_2d0 = masked_array(data = (0, [[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0]],
0.0),
mask = (False, [[True, False, True],
[False, False, True]],
False),
dtype = "int, (2,3)float, float")
str(t_2d0) # repr also works, str(t_2d0.data) does *not*.
# Removing a single item from the below list fixes it!
# Changing the types between float/int does not change anything
np.array([1., 0., 1., 2., 1., 0., 1., 3.]) and yes the EDIT: "does not help" meaning "does not work to get a simpler example reproducing the lost behaviour". What is lost here are actually float objects, some of which seem created by a numpy cast from object to float. But I am not sure that is guaranteed. So, anyone genius enough to squirt at this and see something that might lead anywhere? It seems completely absurd to me right now, especially why the last array call and the list length can be important. |
Sqirting at the cpython code, pyfloats have a list of float objects flying around, probably there are some shenanigans going on there. |
OK, can't say I fully understand it, but my current guess is that this is because of pythons list of discarded float objects to avoid memory allocations when creating new ones. Since this abuses the slots of the object for linking, I am not sure valgrind would find those as "reachable". On the other hand.... if that is (which I assume), why do we not see a similar thing constantly? |
I took a look since I was touching the float/ma code. Here's a shorter test where I see the leak: import numpy as np
res = np.array((0, [[0., 0., 0.], [0., 0., 0.]], 0),
dtype=[('f0', 'O'), ('f1', 'O', (2, 3)), ('f2', 'O')])
del res
np.array([1., 0., 1., 2., 1., 0., 1., 3.]) I'd bet is has to do with the deallocation of the subarray, which has to decref a bunch of floats. |
Yup, |
Rough possible code, adapted from if (PyDataType_HASSUBARRAY(dtype)) {
PyArray_Dims shape = {NULL, -1};
npy_intp size = 1;
if (!(PyArray_IntpConverter(dtype->subarray->shape, &shape))) {
PyErr_SetString(PyExc_ValueError,
"invalid subarray shape");
return;
}
size = PyArray_MultiplyList(shape.ptr, shape.len);
for (i = 0; i < size; i++){
NPY_COPY_PYOBJECT_PTR(&temp, data + i*sizeof(PyObject*));
Py_XDECREF(temp);
}
} That's as far as I'll get today. |
Ah, very nice cooking down and tracking, sounds like it should be spot-on. Thanks @ahaldane, I guess the print only caused the object cast, makes sense. Possibly the randomness is because of the small array cache needing to be cleared, so we have to create a new array of the right size. |
So knowing this, it is much easier to skip the valgrind part, and just put in a known float 6 times to see its refcount never decrease on array deletion. |
@ahaldane thanks for that code snippet. I simplified it a bit and copied it in, and it seems to work. We really need some tests though. It is also a shame (and praise to masked arrays) that this is only found by the masked array test suit and not the rest of numpy. So, with this, I think this effort is finalized, with the exception of new tests, especially for the last issue and possibly some others based on code coverage or review request (that would be nice anyway). The f2py leaks remain, and may be substantial, but I am not quite willing to dig into it. This hunt has already been time consuming. |
a00d04a
to
370044b
Compare
7ff7b2e
to
0f1500a
Compare
OK, I moved the subarray reference count fix to gh-12650. That should make this PR pretty manageable review wise. |
LGTM, I will squash-merge in a bit (in case someone else wants to take a look). |
The XDECREF is not required, the type cannot be NULL.
496cab8
to
4732c1b
Compare
@mattip added a fixup to change those lines to |
All tests passed, including the flaky one. |
Thanks @seberg |
Nice work. |
I may add more fixes, but right now the output is pretty spammed by some errors for not cleaned up dtypes and I am not sure where that comes from. (compare also gh-12615)
Edit, track things that still leak. I do not think we should wait for all of these to be tracked down, some of them are probably very difficult to find.
test_nditer.py
test_arrayprint.py
(or maybe it were only dtypes to begin with)test_arrayprint.py
test_defchararray.py
test_ufuncs.py
test_memoverlap.py
.test_recfunctions.py
(maybe some tuple stored but not cleaned up by dtypes?) andtest_ufuncs.py
.test_ufunc.py
leaks https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L2666(probably just the free missing).
test_array_from_pyobj.py
maybe more) f2py tests leakPyFortranObject
stest_datetime.py
Some tests are needed:
[ ] Object dtype subarray reference count test.Hack I am using right now is to run the following script (after cleaning up and running pytest to cache the build with debugging):
to get at least test file specific output of the leaks (of course you can bisect to find a test that triggers manually then, but it is pretty time consuming), would be need to have a better way, but probably not important enough.