-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MAINT: refactor ufunc iter operand flags handling #11580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mattip - I only had a glancing look so far, but I like this as a refactoring, even independently of the final goal. Was the idea to do this over multiple PRs, or over multiple commits? I think the former may be better (at least for this one), so ping me when this is no longer a WIP. |
@mhvk I have gone about as far as I can go with this refactoring. I am not sure where in the code the calls to the new Note that the function itself changed flag handling logic so that if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip - The changes look good but I am a bit uncomfortable with the changes to the input flags (outputs are OK, since they were never used anyway). See longer note.
cc @eric-wieser as I think it would be good to have another look.
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -302,6 +302,49 @@ _find_array_prepare(ufunc_full_args args, | |||
return; | |||
} | |||
|
|||
NPY_NO_EXPORT void | |||
_ufunc_setup_flags(PyUFuncObject *ufunc, npy_intp op_in_flags, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the types of op_in_flags
, op_out_flags
, and op_flags
all be the same (i.e., all npy_uint32
)? Otherwise, the assignment can be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing
numpy/core/src/umath/ufunc_object.c
Outdated
for (i = 0; i < nin; ++i) { | ||
op_flags[i] = ufunc->op_flags[i]; | ||
if (op_flags[i] == 0) { | ||
op_flags[i] = op_in_flags | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If op_in_flags
is meant to be the default, why are we adding flags to it? I guess these are always needed? But then, shouldn't they always be added to ufunc->op_flags[i]
as well? If the flags are not always needed, I think it may be better to just pass those defaults into the function.
In either case, I think the loop (both for in and out) itself should be:
op_flags[i] = ufunc->op_flags[i] ? ufunc->op_flags[i] : op_in_flags;
with then possibly the or'ing with the "standard" flags after it if they are always needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I stated in the justification for the pull request: there is curently no option for user-defined flags to override the flags here, and I definitely want to exclude NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE. I could also imagine a user who would want to exclude NPY_ITER_ALIGNED if their inner loop does not depend on element access being aligned. I have no strong opinion on whether the op_in_flags
should be only the exceptions from the common set of flags or should be orred with a NPY_UFUNC_DEFAULT_OP_FLAGS = NPY_ITER_READONLY | NPY_ITER_ALIGNED | NPY_ITER_OVERLAP_ASSUMED_ELEMENTWISE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is whether there are people out there relying on the flags they define being OR'd with the default ones; we ideally do not break their code.
numpy/core/src/umath/ufunc_object.c
Outdated
* possible temporary array. | ||
*/ | ||
op_flags[i] = op_out_flags | | ||
NPY_ITER_ALIGNED | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the outputs, since previously their op_flags were ignored, a change in behaviour would seem OK.
Codecov Report
@@ Coverage Diff @@
## master #11580 +/- ##
=======================================
Coverage 85.7% 85.7%
=======================================
Files 327 327
Lines 81985 81985
=======================================
Hits 70264 70264
Misses 11721 11721 Continue to review full report at Codecov.
|
e7eb245
to
7480760
Compare
Rebased off master to reduce codecov noise, squashed commits, removed possibly backward-incompatible behavior with input flags |
@mhvk I left the input flag logic as it was previously |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, I think we're safe and can discuss the question of whether the logic for inputs should change separately.
I'm approving, but will wait a day before merging to give @eric-wieser a chance to chime in.
@eric-wieser any thoughts? |
* iterator, so that the elements we don't write to are copied to the | ||
* possible temporary array. | ||
*/ | ||
_ufunc_setup_flags(ufunc, NPY_ITER_COPY | NPY_UFUNC_DEFAULT_INPUT_FLAGS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines < 80 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
numpy/core/src/umath/ufunc_object.c
Outdated
} | ||
if (wheremask != NULL) { | ||
/* Set up the flags. */ | ||
default_op_out_flags = NPY_ITER_NO_SUBTYPE | NPY_ITER_WRITEMASKED | NPY_UFUNC_DEFAULT_OUTPUT_FLAGS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines too long. also below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Needs some style fixes to break up the long lines. |
numpy/core/src/umath/ufunc_object.c
Outdated
npy_uint32 op_out_flags, npy_uint32 *op_flags) | ||
{ | ||
/* | ||
* Note input flags are handled differently than ouput flags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworked entirely
NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE | ||
|
||
NPY_NO_EXPORT void | ||
_ufunc_setup_flags(PyUFuncObject *ufunc, npy_uint32 op_in_flags, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could maybe do with a docstring
} | ||
} | ||
for (i = nin; i < nop; ++i) { | ||
op_flags[i] = ufunc->op_flags[i] ? ufunc->op_flags[i] : op_out_flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are op_flags
and ufunc->op_flags
here ever the same pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. op_flags
is the per-call copy of ufunc->op_flags
, updated for the actual call arguments and variation (reduce
, __call__
, reduceat
)
Does this second problem have a visible effect in |
@@ -1333,7 +1351,8 @@ execute_legacy_ufunc_loop(PyUFuncObject *ufunc, | |||
NPY_ORDER order, | |||
npy_intp buffersize, | |||
PyObject **arr_prep, | |||
ufunc_full_args full_args) | |||
ufunc_full_args full_args, | |||
npy_uint32 *op_flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring for this function should be updated with this new argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked
The second problem was overlap between input and output, which means we need to use writeback semantics or another technique to prevent writing to input. As far as I can tell, the problem does not exist in |
ping |
@eric-wieser can we merge this? |
37d8f1b
to
9afa2f7
Compare
@eric-wieser can we merge this PR? It is blocking further progress with PR #11133 matmul-as-ufunc |
Gentle ping to see if we can get this one closed down. |
Merging. @eric-wieser if you have further comments please raise them in a new issue |
In order for the inner loop of matmul to use BLAS, we need to be able to require contiguous input and output memory. Additionally, if there is memory overlap between the output and the input we must not write the results to the output array until looping is finished.
linalg
solves this problem by copying in the inner loop, which solves the first problem but not the second.einsum
anddot
solve this problem by not going throughPyUFunc_GenericFunction
, i.e. they are not true gufuncs.I can use a custom
ufunc->type_resolver
function for matmul to copy the inputs to contiguous memory and create an output array with writeback semantics, but there is nowhere to trigger resolution of those semantics after iterating. The most convenient thing would be to use iter operand flags to get theNpyIter
to manage the writeback semantics for me. However, theNPY_ITER_OVERLAP_ASSUME_ELEMENTWISE
defeats that strategy, it tellsNpyIter
to avoid creating a writeback buffer under conditions that matmul requires a writeback.I tried one approach to solving this in PR #11381, which simply removed the
NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE
flag from generalized ufuncs (ones that callPyUFunc_GeneralizedFunction
). It seems people count on that behaviour, and so the PR was rejected.This is an attempt at another approach, where the handling of the flags is pushed into the
ufunc->type_resolver
function, and I can override default behaviour in the matmul-specific one that anyhow I need in order to force input arrays to be contiguous.In this first step, I create a generic function to set per-operand flags and use it in the three looping strategies. In later steps I will try to move the function into the generic
ufunc->type_resolver
.