10000 BUG: Invalid read of size 4 in PyArray_FromFile by rainwoodman · Pull Request #7175 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Invalid read of size 4 in PyArray_FromFile #7175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

rainwoodman
Copy link
Contributor

When the input dtype has a subarray, the dtype is DECREFed by PyArray_NewFromDescr,
before dtype->elsize is accessed.

If no one else holds a reference to the dtype object, then the dtype object will be destroyed,
and dtype->elsize shall not be accessed.
This raises an error in Valgrind, and occasionally crashes innocently looking code.
e.g. numpy.fromfile('filename', dtype=('f8', 3'))

A workaround would be
dtype=numpy.dtype(('f8', 3)); numpy.fromfile('filename', dtype=dtype)

This affects versions as early as 1.9.2 (where I found this bug) and seems to be still relevant today.
I hope someone can prove me wrong.
This PR is just a demonstration of the idea. I didn't try to compile it.

Valgrind log:

==17479== Invalid read of size 4
==17479==    at 0x1CD0BB88: UnknownInlinedFun (stdio2.h:295)
==17479==    by 0x1CD0BB88: array_fromfile_binary (ctors.c:3177)
==17479==    by 0x1CD0BB88: PyArray_FromFile (ctors.c:3304)
==17479==    by 0x1CD886B5: array_fromfile (multiarraymodule.c:2040)
==17479==    by 0x37AB6E2571: do_call (ceval.c:4327)
==17479==    by 0x37AB6E2571: call_function (ceval.c:4135)
==17479==    by 0x37AB6E2571: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E36B3: PyEval_EvalCodeEx (ceval.c:3344)
==17479==    by 0x37AB6E25C5: fast_function (ceval.c:4208)
==17479==    by 0x37AB6E25C5: call_function (ceval.c:4133)
==17479==    by 0x37AB6E25C5: PyEval_EvalFrameEx (ceval.c:2755)
==17479==  Address 0x113bef20 is 32 bytes inside a block of size 88 free'd
==17479==    at 0x4A07D6A: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==17479==    by 0x1CD07C15: _update_descr_and_dimensions (ctors.c:273)
==17479==    by 0x1CD07C15: PyArray_NewFromDescr_int (ctors.c:900)
==17479==    by 0x1CD0BB76: PyArray_NewFromDescr (ctors.c:1121)
==17479==    by 0x1CD0BB76: array_fromfile_binary (ctors.c:3168)
==17479==    by 0x1CD0BB76: PyArray_FromFile (ctors.c:3304)
==17479==    by 0x1CD886B5: array_fromfile (multiarraymodule.c:2040)
==17479==    by 0x37AB6E2571: do_call (ceval.c:4327)
==17479==    by 0x37AB6E2571: call_function (ceval.c:4135)
==17479==    by 0x37AB6E2571: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)

When the input dtype has a subarray, the dtype is DECREFed by PyArray_NewFromDescr, 
before dtype->elsize is accessed. 

If no one else holds a reference to the dtype object, then the dtype object will be destroyed, 
and dtype->elsize shall not be accessed. 
This raises an error in Valgrind, and occasionally crashes innocently looking code.
e.g. ```numpy.fromfile('filename', dtype=('f8', 3'))``` 

A workaround would be
```dtype=numpy.dtype(('f8', 3)); numpy.fromfile('filename', dtype=dtype)```

This affects versions as early as 1.9.2 (where I found this bug) and seems to be still relevant today. 
I hope someone can prove me wrong.

Valgrind log:
```
==17479== Invalid read of size 4
==17479==    at 0x1CD0BB88: UnknownInlinedFun (stdio2.h:295)
==17479==    by 0x1CD0BB88: array_fromfile_binary (ctors.c:3177)
==17479==    by 0x1CD0BB88: PyArray_FromFile (ctors.c:3304)
==17479==    by 0x1CD886B5: array_fromfile (multiarraymodule.c:2040)
==17479==    by 0x37AB6E2571: do_call (ceval.c:4327)
==17479==    by 0x37AB6E2571: call_function (ceval.c:4135)
==17479==    by 0x37AB6E2571: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E36B3: PyEval_EvalCodeEx (ceval.c:3344)
==17479==    by 0x37AB6E25C5: fast_function (ceval.c:4208)
==17479==    by 0x37AB6E25C5: call_function (ceval.c:4133)
==17479==    by 0x37AB6E25C5: PyEval_EvalFrameEx (ceval.c:2755)
==17479==  Address 0x113bef20 is 32 bytes inside a block of size 88 free'd
==17479==    at 0x4A07D6A: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==17479==    by 0x1CD07C15: _update_descr_and_dimensions (ctors.c:273)
==17479==    by 0x1CD07C15: PyArray_NewFromDescr_int (ctors.c:900)
==17479==    by 0x1CD0BB76: PyArray_NewFromDescr (ctors.c:1121)
==17479==    by 0x1CD0BB76: array_fromfile_binary (ctors.c:3168)
==17479==    by 0x1CD0BB76: PyArray_FromFile (ctors.c:3304)
==17479==    by 0x1CD886B5: array_fromfile (multiarraymodule.c:2040)
==17479==    by 0x37AB6E2571: do_call (ceval.c:4327)
==17479==    by 0x37AB6E2571: call_function (ceval.c:4135)
==17479==    by 0x37AB6E2571: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB664DCB: gen_send_ex.isra.0 (genobject.c:85)
==17479==    by 0x37AB6DE419: PyEval_EvalFrameEx (ceval.c:2586)
==17479==    by 0x37AB6E2665: fast_function (ceval.c:4198)
==17479==    by 0x37AB6E2665: call_function (ceval.c:4133)
==17479==    by 0x37AB6E2665: PyEval_EvalFrameEx (ceval.c:2755)
```
rainwoodman added a commit to rainwoodman/nbodykit that referenced this pull request Feb 3, 2016
Silly but it works.
}
NPY_BEGIN_ALLOW_THREADS;
*nread = fread(PyArray_DATA(r), dtype->elsize, num, fp);
NPY_END_ALLOW_THREADS;
fail:
Py_DECREF(dtype);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want the Py_DECREF

/*NUMPY_API
 * Generic new array creation routine.
 *
 * steals a reference to descr (even on failure)
 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree.

Line 3267 adds an anchoring ref to protect the object against stealing, so the code a must to release this anchor before exiting, to maintain the documented behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the lost reference only occurs when there is also another error? I suspect that: https://github.com/rainwoodman/numpy/blob/patch-1/numpy/core/src/multiarray/ctors.c#L3425 may be the culprit. since the array_fromfile_binary steals a reference (also on error).

@charris
Copy link
Member
charris commented Feb 3, 2016

IIRC, there are several open issues for fromfile.

@juliantaylor
Copy link
Contributor

probably time to run the python reference checker again

@rainwoodman
Copy link
Contributor Author

Is there a goal in numpy 2.0 to get rid of all of these stolen references from the public API?

@njsmith
Copy link
Member
njsmith commented Feb 4, 2016

There aren't currently any plans for numpy 2.0. There is work on making the API more evolveable in place, but it's not clear when/if that will bear fruit (it requires changes in the overall python packaging ecosystem), and even if we gain the technical ability it's not clear whether this reference stealing thing is fixable. It sucks, but it does work, and there's a ton of old legacy code that would all break.

@rainwoodman
Copy link
Contributor Author

I am confident stealing can be avoided. It was probably introduced because the very first author didn't want to type an extra line to unref the dtype object.

But I agree breaking ABI massively is not always a wise decision, now that the very first decision when numpy was coded up has been solidified.

@njsmith
Copy link
Member
njsmith commented Feb 4, 2016

I think the history has to do with the fact that in the precursor codebase that numpy evolved from, dtypes were just a plain non-reference-counted C struct rather than a reference-counted Python object, so the expectation was that the caller would allocate it and then pass it into an array-creation function, and the array would take ownership. But it doesn't really matter :-) And yeah, the other way of doing the API would be possible and better -- it's just that it's a hard decision to back out at this point. (I guess that we could add a new set of API functions that act correctly, and then at least people who don't want to deal with this weirdness could use them and ignore it.)

@charris charris added this to the 1.12.0 release milestone Apr 16, 2016
@charris
Copy link
Member
charris commented Jun 17, 2016

I don't think this is correct. It looks like the problem is that when there is a subarray PyArray_NewFromDescr_int calls _update_descr_and_dimensions, which decrefs the original descriptor and assigns a new one. So in this (successful) case the descriptor isn't borrowed, it is deleted. I'm not sure what should be passed to fread in this case, but I would guess the easiest solution would be to Init an elsize variable before the call to PyArray_NewFromDescr and use that later. There are intricacies in the subarray case that I haven't thought out that may need more work. Note that array_from_text looks to have the same problem.

@charris
Copy link
Member
charris commented Jun 17, 2016

The real bug is in PyArray_NewFromDescr_int, which should always borrow the descriptor, not decref it. Unfortunately, it may be called recursively and the current implementation avoids a possible memory leak. Not sure if the subarray can contain a subarray, but if so that is a problem, and not just in the current function but in general use.

EDIT: I don't think it is fixable, the decref is probably correct. So we should document that the descr is not guaranteed to be usable after a successful call and audit our code to make sure it is safe.

@charris
Copy link
Member
charris commented Jun 18, 2016

I opened issue #7756 for this as a reference. I've added some comments and made a small cleanup to the commit message and array_fromfile_binary in #7757, so closing this. Thanks @rainwoodman .

@charris charris closed this Jun 18, 2016
@charris charris removed this from the 1.12.0 release milestone Jun 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0