-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: Invalid read of size 4 in PyArray_FromFile #7175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When the input dtype has a subarray, the dtype is DECREFed by PyArray_NewFromDescr, before dtype->elsize is accessed. If no one else holds a reference to the dtype object, then the dtype object will be destroyed, and dtype->elsize shall not be accessed. This raises an error in Valgrind, and occasionally crashes innocently looking code. e.g. ```numpy.fromfile('filename', dtype=('f8', 3'))``` A workaround would be ```dtype=numpy.dtype(('f8', 3)); numpy.fromfile('filename', dtype=dtype)``` This affects versions as early as 1.9.2 (where I found this bug) and seems to be still relevant today. I hope someone can prove me wrong. Valgrind log: ``` ==17479== Invalid read of size 4 ==17479== at 0x1CD0BB88: UnknownInlinedFun (stdio2.h:295) ==17479== by 0x1CD0BB88: array_fromfile_binary (ctors.c:3177) ==17479== by 0x1CD0BB88: PyArray_FromFile (ctors.c:3304) ==17479== by 0x1CD886B5: array_fromfile (multiarraymodule.c:2040) ==17479== by 0x37AB6E2571: do_call (ceval.c:4327) ==17479== by 0x37AB6E2571: call_function (ceval.c:4135) ==17479== by 0x37AB6E2571: PyEval_EvalFrameEx (ceval.c:2755) ==17479== by 0x37AB6E2665: fast_function (ceval.c:4198) ==17479== by 0x37AB6E2665: call_function (ceval.c:4133) ==17479== by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755) ==17479== by 0x37AB6E2665: fast_function (ceval.c:4198) ==17479== by 0x37AB6E2665: call_function (ceval.c:4133) ==17479== by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755) ==17479== by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85) ==17479== by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586) ==17479== by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85) ==17479== by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586) ==17479== by 0x37AB6E2665: fast_function (ceval.c:4198) ==17479== by 0x37AB6E2665: call_function (ceval.c:4133) ==17479== by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755) ==17479== by 0x37AB6E36B3: PyEval_EvalCodeEx (ceval.c:3344) ==17479== by 0x37AB6E25C5: fast_function (ceval.c:4208) ==17479== by 0x37AB6E25C5: call_function (ceval.c:4133) ==17479== by 0x37AB6E25C5: PyEval_EvalFrameEx (ceval.c:2755) ==17479== Address 0x113bef20 is 32 bytes inside a block of size 88 free'd ==17479== at 0x4A07D6A: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==17479== by 0x1CD07C15: _update_descr_and_dimensions (ctors.c:273) ==17479== by 0x1CD07C15: PyArray_NewFromDescr_int (ctors.c:900) ==17479== by 0x1CD0BB76: PyArray_NewFromDescr (ctors.c:1121) ==17479== by 0x1CD0BB76: array_fromfile_binary (ctors.c:3168) ==17479== by 0x1CD0BB76: PyArray_FromFile (ctors.c:3304) ==17479== by 0x1CD886B5: array_fromfile (multiarraymodule.c:2040) ==17479== by 0x37AB6E2571: do_call (ceval.c:4327) ==17479== by 0x37AB6E2571: call_function (ceval.c:4135) ==17479== by 0x37AB6E2571: PyEval_EvalFrameEx (ceval.c:2755) ==17479== by 0x37AB6E2665: fast_function (ceval.c:4198) ==17479== by 0x37AB6E2665: call_function (ceval.c:4133) ==17479== by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755) ==17479== by 0x37AB6E2665: fast_function (ceval.c:4198) ==17479== by 0x37AB6E2665: call_function (ceval.c:4133) ==17479== by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755) ==17479== by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85) ==17479== by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586) ==17479== by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85) ==17479== by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586) ==17479== by 0x37AB6E2665: fast_function (ceval.c:4198) ==17479== by 0x37AB6E2665: call_function (ceval.c:4133) ==17479== by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755) ```
Silly but it works.
} | ||
NPY_BEGIN_ALLOW_THREADS; | ||
*nread = fread(PyArray_DATA(r), dtype->elsize, num, fp); | ||
NPY_END_ALLOW_THREADS; | ||
fail: | ||
Py_DECREF(dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want the Py_DECREF
/*NUMPY_API
* Generic new array creation routine.
*
* steals a reference to descr (even on failure)
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree.
Line 3267 adds an anchoring ref to protect the object against stealing, so the code a must to release this anchor before exiting, to maintain the documented behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the lost reference only occurs when there is also another error? I suspect that: https://github.com/rainwoodman/numpy/blob/patch-1/numpy/core/src/multiarray/ctors.c#L3425 may be the culprit. since the array_fromfile_binary
steals a reference (also on error).
IIRC, there are several open issues for |
probably time to run the python reference checker again |
Is there a goal in numpy 2.0 to get rid of all of these stolen references from the public API? |
There aren't currently any plans for numpy 2.0. There is work on making the API more evolveable in place, but it's not clear when/if that will bear fruit (it requires changes in the overall python packaging ecosystem), and even if we gain the technical ability it's not clear whether this reference stealing thing is fixable. It sucks, but it does work, and there's a ton of old legacy code that would all break. |
I am confident stealing can be avoided. It was probably introduced because the very first author didn't want to type an extra line to unref the dtype object. But I agree breaking ABI massively is not always a wise decision, now that the very first decision when numpy was coded up has been solidified. |
I think the history has to do with the fact that in the precursor codebase that numpy evolved from, dtypes were just a plain non-reference-counted C struct rather than a reference-counted Python object, so the expectation was that the caller would allocate it and then pass it into an array-creation function, and the array would take ownership. But it doesn't really matter :-) And yeah, the other way of doing the API would be possible and better -- it's just that it's a hard decision to back out at this point. (I guess that we could add a new set of API functions that act correctly, and then at least people who don't want to deal with this weirdness could use them and ignore it.) |
I don't think this is correct. It looks like the problem is that when there is a subarray |
The real bug is in EDIT: I don't think it is fixable, the decref is probably correct. So we should document that the descr is not guaranteed to be usable after a successful call and audit our code to make sure it is safe. |
I opened issue #7756 for this as a reference. I've added some comments and made a small cleanup to the commit message and |
When the input dtype has a subarray, the dtype is DECREFed by PyArray_NewFromDescr,
before dtype->elsize is accessed.
If no one else holds a reference to the dtype object, then the dtype object will be destroyed,
and dtype->elsize shall not be accessed.
This raises an error in Valgrind, and occasionally crashes innocently looking code.
e.g.
numpy.fromfile('filename', dtype=('f8', 3'))
A workaround would be
dtype=numpy.dtype(('f8', 3)); numpy.fromfile('filename', dtype=dtype)
This affects versions as early as 1.9.2 (where I found this bug) and seems to be still relevant today.
I hope someone can prove me wrong.
This PR is just a demonstration of the idea. I didn't try to compile it.
Valgrind log: