8000 MAINT: refactor ufunc iter operand flags handling by mattip · Pull Request #11580 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: refactor ufunc iter operand flags handling #11580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 6, 2018

Conversation

mattip
Copy link
Member
@mattip mattip commented Jul 17, 2018

In order for the inner loop of matmul to use BLAS, we need to be able to require contiguous input and output memory. Additionally, if there is memory overlap between the output and the input we must not write the results to the output array until looping is finished. linalg solves this problem by copying in the inner loop, which solves the first problem but not the second. einsum and dot solve this problem by not going through PyUFunc_GenericFunction, i.e. they are not true gufuncs.

I can use a custom ufunc->type_resolver function for matmul to copy the inputs to contiguous memory and create an output array with writeback semantics, but there is nowhere to trigger resolution of those semantics after iterating. The most convenient thing would be to use iter operand flags to get the NpyIter to manage the writeback semantics for me. However, the NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE defeats that strategy, it tells NpyIter to avoid creating a writeback buffer under conditions that matmul requires a writeback.

I tried one approach to solving this in PR #11381, which simply removed the NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE flag from generalized ufuncs (ones that call PyUFunc_GeneralizedFunction). It seems people count on that behaviour, and so the PR was rejected.

This is an attempt at another approach, where the handling of the flags is pushed into the ufunc->type_resolver function, and I can override default behaviour in the matmul-specific one that anyhow I need in order to force input arrays to be contiguous.

In this first step, I create a generic function to set per-operand flags and use it in the three looping strategies. In later steps I will try to move the function into the generic ufunc->type_resolver.

@mhvk
Copy link
Contributor
mhvk commented Jul 17, 2018

@mattip - I only had a glancing look so far, but I like this as a refactoring, even independently of the final goal. Was the idea to do this over multiple PRs, or over multiple commits? I think the former may be better (at least for this one), so ping me when this is no longer a WIP.

@mattip mattip changed the title WIP: MAINT: refactor ufunc iter operand flags handling MAINT: refactor ufunc iter operand flags handling Jul 22, 2018
@mattip
Copy link
Member Author
mattip commented Jul 22, 2018

@mhvk I have gone about as far as I can go with this refactoring. I am not sure where in the code the calls to the new _ufunc_setup_flags function should be placed, I pushed the calls up to be as early as possible just to prove that the calls work and nothing else overrides the flags.

Note that the function itself changed flag handling logic so that if ufunc->op_flags[i] is set, the call will not override any of the flags. Previously output ufunc->op_flags were never read, and input flags were ored together. This is to allow the behaviour I need for matmul-as-a-ufunc (PR #11381 and the description above)

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip - The changes look good but I am a bit uncomfortable with the changes to the input flags (outputs are OK, since they were never used anyway). See longer note.

cc @eric-wieser as I think it would be good to have another look.

@@ -302,6 +302,49 @@ _find_array_prepare(ufunc_full_args args,
return;
}

NPY_NO_EXPORT void
_ufunc_setup_flags(PyUFuncObject *ufunc, npy_intp op_in_flags,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the types of op_in_flags, op_out_flags, and op_flags all be the same (i.e., all npy_uint32)? Otherwise, the assignment can be problematic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing

for (i = 0; i < nin; ++i) {
op_flags[i] = ufunc->op_flags[i];
if (op_flags[i] == 0) {
op_flags[i] = op_in_flags |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If op_in_flags is meant to be the default, why are we adding flags to it? I guess these are always needed? But then, shouldn't they always be added to ufunc->op_flags[i] as well? If the flags are not always needed, I think it may be better to just pass those defaults into the function.

In either case, I think the loop (both for in and out) itself should be:

op_flags[i] = ufunc->op_flags[i] ? ufunc->op_flags[i] : op_in_flags;

with then possibly the or'ing with the "standard" flags after it if they are always needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in the justification for the pull request: there is curently no option for user-defined flags to override the flags here, and I definitely want to exclude NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE. I could also imagine a user who would want to exclude NPY_ITER_ALIGNED if their inner loop does not depend on element access being aligned. I have no strong opinion on whether the op_in_flags should be only the exceptions from the common set of flags or should be orred with a NPY_UFUNC_DEFAULT_OP_FLAGS = NPY_ITER_READONLY | NPY_ITER_ALIGNED | NPY_ITER_OVERLAP_ASSUMED_ELEMENTWISE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is whether there are people out there relying on the flags they define being OR'd with the default ones; we ideally do not break their code.

* possible temporary array.
*/
op_flags[i] = op_out_flags |
NPY_ITER_ALIGNED |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the outputs, since previously their op_flags were ignored, a change in behaviour would seem OK.

@codecov-io
Copy link
codecov-io commented Jul 25, 2018

Codecov Report

Merging #11580 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #11580   +/-   ##
=======================================
  Coverage    85.7%    85.7%           
=======================================
  Files         327      327           
  Lines       81985    81985           
=======================================
  Hits        70264    70264           
  Misses      11721    11721

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 210f07f...7480760. Read the comment docs.

@mattip mattip force-pushed the ufunc-flag-refactor branch from e7eb245 to 7480760 Compare July 25, 2018 17:07
@mattip
Copy link
Member Author
mattip commented Jul 25, 2018

Rebased off master to reduce codecov noise, squashed commits, removed possibly backward-incompatible behavior with input flags

@mattip
Copy link
Member Author
mattip commented Jul 25, 2018

@mhvk I left the input flag logic as it was previously

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, I think we're safe and can discuss the question of whether the logic for inputs should change separately.

I'm approving, but will wait a day before merging to give @eric-wieser a chance to chime in.

@mattip
Copy link
Member Author
mattip commented Aug 6, 2018

@eric-wieser any thoughts?

* iterator, so that the elements we don't write to are copied to the
* possible temporary array.
*/
_ufunc_setup_flags(ufunc, NPY_ITER_COPY | NPY_UFUNC_DEFAULT_INPUT_FLAGS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines < 80 characters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

8000
}
if (wheremask != NULL) {
/* Set up the flags. */
default_op_out_flags = NPY_ITER_NO_SUBTYPE | NPY_ITER_WRITEMASKED | NPY_UFUNC_DEFAULT_OUTPUT_FLAGS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines too long. also below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@charris
Copy link
Member
charris commented Aug 15, 2018

Needs some style fixes to break up the long lines.

npy_uint32 op_out_flags, npy_uint32 *op_flags)
{
/*
* Note input flags are handled differently than ouput flags.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworked entirely

NPY_ITER_OVERLAP_ASSUME_ELEMENTWISE

NPY_NO_EXPORT void
_ufunc_setup_flags(PyUFuncObject *ufunc, npy_uint32 op_in_flags,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could maybe do with a docstring

}
}
for (i = nin; i < nop; ++i) {
op_flags[i] = ufunc->op_flags[i] ? ufunc->op_flags[i] : op_out_flags;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are op_flags and ufunc->op_flags here ever the same pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. op_flags is the per-call copy of ufunc->op_flags, updated for the actual call arguments and variation (reduce, __call__, reduceat)

@eric-wieser
Copy link
Member

linalg solves this problem by copying in the inner loop, which solves the first problem but not the second

Does this second problem have a visible effect in linalg?

@@ -1333,7 +1351,8 @@ execute_legacy_ufunc_loop(PyUFuncObject *ufunc,
NPY_ORDER order,
npy_intp buffersize,
PyObject **arr_prep,
ufunc_full_args full_args)
ufunc_full_args full_args,
npy_uint32 *op_flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring for this function should be updated with this new argument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked

@mattip
Copy link
Member Author
mattip commented Aug 21, 2018

Does this second problem have a visible effect in linalg?

The second problem was overlap between input and output, which means we need to use writeback semantics or another technique to prevent writing to input. As far as I can tell, the problem does not exist in linalg functions in the first place, correct me if I am wrong.

@mattip
Copy link
Member Author
mattip commented Sep 9, 2018

ping

@mattip
Copy link
Member Author
mattip commented Oct 19, 2018

@eric-wieser can we merge this?

@mattip mattip mentioned this pull request Oct 19, 2018
6 tasks
@eric-wieser eric-wieser self-assigned this Oct 19, 2018
@mattip mattip force-pushed the ufunc-flag-refactor branch from 37d8f1b to 9afa2f7 Compare October 21, 2018 05:51
@mattip
Copy link
Member Author
mattip commented Oct 27, 2018

@eric-wieser can we merge this PR? It is blocking further progress with PR #11133 matmul-as-ufunc

@stefanv
Copy link
Contributor
stefanv commented Oct 31, 2018

Gentle ping to see if we can get this one closed down.

@mattip
Copy link
Member Author
mattip commented Nov 6, 2018

Merging. @eric-wieser if you have further comments please raise them in a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0