-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: better detect memory overlap when calling gufuncs #11381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looking through scipy, astropy I find no use of |
Found and fixed a bug in 4c75113 where This could be a separate PR but they both come under the heading of gufuncs and memory overlap. |
NPY_ITER_ALIGNED | | ||
NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE; | ||
NPY_ITER_ALIGNED; | ||
if (op[i] && PyArray_NDIM(op[i]) < 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's special about 0d / 1d arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fix was wrong. Any element-wise gufunc
(one with an empty signature '(),()->()' will not hit this code path, it will not have core_enabled
. Removing this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 8000 catch - a copy & paste error from quite a while ago, I suspect! Looks good modulo the single-element thing you're already removing.
One can also argue this is a bug in the matmul ufunc implementation, and the NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE is as intended. I think there is a design decision that ufunc inner loops are supposed to be able to deal with overlapping operands/aliasing, which is certainly assumed in the scalar case. This is of course more difficult to deal with in the gufunc case (I don't recall whether the inner loop core dimensions are contiguous in memory or not). I suspect (without trying) that matmul.accumulate and reduce don't work correctly currently, which would demonstrate it's a bug in the inner loop implementation. EDIT: I see now reduce/accumulate raise an error these operations are not defined for gufuncs, fair enough, although the reason why is not obvious at first glance. Anyway, I'd prefer to make an explicit design decision here, rather than considering this a bugfix. |
The inner loops just get access to the arrays; the loop dimensions do not have to be contiguous. In principle, the inner loop indeed should check whether in and out overlap. Indeed, I just wrote code that explicitly uses the fact that in and out can be the same to speed up the loop... I hadn't quite appreciated that this change might make that much less efficient... |
Closing this (I will refactor the einsum issue to a different PR), but I think that code is copy-pasted and wrong. The documentation seems to strengthen my claim, and a couple of other comments:
|
@mhvk can you share your gufunc? I am curious how I could turn it into a test and rewrite it to not depend on this IMO buggy behaviour. |
Whenever possible, I think it'd be good to move towards unifying the gufunc/ufunc semantics and code paths. Also, I think there may be cases where gufuncs want to allow aliasing and handle it specially? For example, consider a It's also entirely possible that at some point we'll have simple ufuncs where handling aliasing is difficult -- e.g. think about a ufunc where the scalars are some complex multi-field dtype. So I lean towards saying that we should handle gufuncs and ufuncs the same way here... which might include adding some flag that a loop can use to say that it doesn't handle aliasing, and ask numpy to take care of that for it using common machinery? |
@mattip - it is exactly the case of identical in and out that may need to be preserved (though your point about being able to add but not subtract a flag is taken). The PR that I mentioned is astropy/astropy#7502, but that may not help you much since it uses a rather complicated
|
there should be a point in the ufunc call that a ufunc can say "element-wise aliasing is OK", or "leave all aliasing to the inner loop" or "always copy-on-overlap" |
Ufuncs need to check output args for memory overlap. In
PyUFunc_GeneralizedFunction
, the check assumed the gufunc was operating elementwise.Found when I started implementing matmul as a gufunc. We seem to have avoided this issue by not having an
out
kwarg
in our current gufuncs (i.e.linalg
, see issue #11380).This may impact downstream users of gufuncs who already try to detect this condition, but brings gufuncs into alignment with ufuncs and the comment in the code "For generalized ufuncs, we can't do buffering, so must COPY or UPDATEIFCOPY".
Test added that failed before, now passes. An additional passing test added for
matmul
, which currently useseinsum
for implementation, but that will change in the future.