-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: Fix segfault in np.random.shuffle for arrays of different length strings #7719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Good catch. LGTM. |
Maybe it would be better to use arr[0, ...] instead? Avoid the roundtrip to potential scalars? |
92adcb2
to
3d0c5bd
Compare
@seberg you're not wrong, a quick benchmark makes it look like that is ~6x faster! |
I'd use |
What we should have for sure is a comment in the code explaining why this is needed. |
3d0c5bd
to
77281e9
Compare
@jaimefrio Yes I agree, whilst the code was self explanatory when using the @mhvk I think that version is easier on the old grey matter as well! |
@@ -5064,7 +5064,8 @@ cdef class RandomState: | |||
x_ptr = <char*><size_t>x.ctypes.data | |||
stride = x.strides[0] | |||
itemsize = x.dtype.itemsize | |||
buf = np.empty_like(x[0]) # GC'd at function exit | |||
# Ensure that buf has the same dtype as x | |||
buf = np.empty_like(x[:1]) # GC'd at function exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And arr[:1]
leaves a dimension that has no meaning ;) (below doing there buf[0, ...]
would somewhat make more sense then ;), at least to me). Anyway, don't care too much.
I don't like the comment though, it explains nothing unless you know the right question already. Rather explain that x[0]
would be a scalar, and that this may also lose the dtype information e.g. string types (it actually gets even worse, x[0]
could be another array, and things likely currently blow up (could be a good test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think the object cases were probably already all fine (though I am not 100% sure, but oh wlel...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexing speaks more loudly to you than to the rest of us ;) Agree about the comment, more explanation is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
# The scalar x[0] has a dtype just big enough to hold that string,
# so use x[:1] to get a dtype big enough to hold any string in x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add at start: "If x is one-dimensional, x[0] would be a scalar with a dtype just ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a reference counting bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course, the view will incref its elements, then they get shuffled under its feet and the decref goes to the wrong one. Easiest solution is likely to just avoid that 1D optimization for object arrays by checking arr.dtype.hasobject
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, well sighted, definitely needs a comment as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit subtle, but buf
is left containing a reference to some object in x
, which is decrefed when the buffer is destroyed. Furthermore, buf
begins containing a reference to None
, so there is also a memory leak. The only other fix I can see is to define the buffer as int8
with length itemsize
or simply use malloc
and free
, which avoids all that ref counting stuff..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is maybe more elegant. Should probably make sure the raw
shuffle does not release the GIL then (no idea if it does).
77281e9
to
eda8f8f
Compare
np.random.shuffle will allocate a buffer based on the size of the first element of an array of strings. If the first element is smaller than another in the array this buffer will overflow, causing a segfault when garbage is collected. Additionally if the array contains objects then one would be left in the buffer and have it's refcount erroniously decrimented on function exit, causing that object to be deallocated too early. To fix this we change the buffer to be an array of int8 of the the size of the array's dtype, which sidesteps both issues. Fixes numpy#7710
eda8f8f
to
5657a6c
Compare
@charris Yes you're right that was a subtle one to track down. Very nice spot. I also like your solution, it's very elegant. I've implemented that now and added a test for the object case. Given that all of the swaps in the multidimensional case are done in python rather than doing memcpys so the object left in the buffer there will be correctly refcounted and the buffer was picking up the correct dtype anyway. I've left that part of the code unchanged. |
Thanks @simongibbons . |
Thanks! |
np.random.shuffle will allocate a buffer based on the size of the first
element of an array of strings. If the first element is smaller than
another in the array this buffer will overflow, causing a segfault
when garbage is collected.
This fixes the issue by ensuring the buffer is allocated based upon
the dtype of the array rather than just that of the first element.
Fixes #7710