You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of both ufuncs and gufuncs (those with core-dimensions and an inner-loop signature) use the NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE flag to prevent allocating buffer memory if it can be determined that:
the data pointers of all overlapping operands are equal
the strides and dimensions are equivalent
the dtypes are equal.
solve_may_have_internal_overlap() for single-byte overlap returns `0
Let's call this element-wise aliasing, since it is intended for elementwise ufuncs like np.sin.
For all other cases, output ndarrays will use writeback semantics to allocate temporary memory.
There should be a point in the gufunc call that a gufunc can say "element-wise aliasing is OK", or "leave all aliasing to the inner loop" or "always copy-on-any-overlap". We need a flag to indicate these (and maybe other, like contiguous) strategies. See also PR #11381 (closed) which proposed unilaterally changing the default.
The text was updated successfully, but these errors were encountered:
I'm not sure, but my initial impression is that the two relevant cases are: (1) the loop can tolerate some core inputs being identical to some core outputs, (2) the loop can't tolerate overlap at all. For cases where there's partial overlap between core dimensions, or overlap between different loop iterations, then maybe we should unconditionally copy? Are there any other interesting cases?
The intended meaning of the ASSUME_ELEMENTWISE flag is that given the
input arrays, for each iterator outer loop index, the inner loop is
guaranteed to not touch memory associated with other iterator outer
indices, so that the overlap detection can do reasoning on the outer
loop level.
The current implementation of ELEMENTWISE is only a special case that's
easy to reason about, and further refinement could be done later.
The current implementation of both ufuncs and gufuncs (those with core-dimensions and an inner-loop signature) use the
NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE
flag to prevent allocating buffer memory if it can be determined that:solve_may_have_internal_overlap()
for single-byte overlap returns `0Let's call this element-wise aliasing, since it is intended for elementwise ufuncs like
np.sin
.For all other cases, output ndarrays will use writeback semantics to allocate temporary memory.
There should be a point in the gufunc call that a gufunc can say "element-wise aliasing is OK", or "leave all aliasing to the inner loop" or "always copy-on-any-overlap". We need a flag to indicate these (and maybe other, like contiguous) strategies. See also PR #11381 (closed) which proposed unilaterally changing the default.
The text was updated successfully, but these errors were encountered: