Preliminary discussion: Standardise support for reuse of buffers #5

honnibal · 2020-09-21T22:45:03Z

honnibal
Sep 21, 2020

(These thoughts are not organised enough for an RFC, and I'm not sure whether the topic is covered elsewhere)

Most numpy routines support an argument out, into which the output of the operation should be stored. This is primarily intended to reduce data allocation and copying. However, this API design has problems in both its generality and its flexibility.

Whenever we're offering an affordance in the API that aims to let the user help us be more efficient, we should take care that the affordance won't force an alternate implementation to be less efficient instead. The out argument falls into exactly this problem.

Let's say I'm using a numpy-like API, and I'm in a position where I could pass the routine an out buffer. I'm indifferent about this semantically, I just want to help it be efficient. So I pass the out buffer. But now the routine is forced to use my out buffer. If it used a temporary buffer to compute its result, it might have to make an extra copy.

The single out argument is also lacking in flexibility. I might have a routine that needs to allocate a large temporary buffer to do its work. If the user has this buffer handy (for instance, if they're working in a loop), they should have the chance to pass it in. But I'll need to come up with an ad hoc argument for this, and the user won't necessarily expect that the option is available.

My suggestion is to let routines take an optional argument that is a list of arrays sorted by descending size. Implementations may use these buffers as working memory instead of making allocations, and they may even use these buffers as return values. But they are not forced to use the buffers --- the user merely offers them as an option to the implementation, and the implementation can do whatever is most efficient.

I would use a short name for this, specifically tmp. If it's a standard convention, I think it's worth the upfront cost of the less intuitive name, in order to get the extra conciseness everywhere else.

With numpy's out you can write either of the following:

xp.max(X, keepdims=True, out=X)
X = xp.max(X, keepdims=True, out=X)

With tmp, you would write:

X = xp.max(X, keepdims=True, tmp=[X])

Not assigning the return value might happen to work, depending on the implementation you were calling into, but it wouldn't be correct to rely on it.

rgommers · 2020-11-10T13:33:52Z

rgommers
Nov 10, 2020
Maintainer

Thanks @honnibal for the thoughts, and apologies for the long delay in replying (the second half of Sep was a bit of a train wreck that I only just recovered from).

Most numpy routines support an argument out, into which the output of the operation should be stored. This is primarily intended to reduce data allocation and copying. However, this API design has problems in both its generality and its flexibility.

Completely agree. After you brought that up on Twitter I already had multiple discussions about this topic with people. We just published the array API standard for review, and mutability was probably the single most complex topic to get to a sensible solution for. We did leave out out=. See https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html

My suggestion is to let routines take an optional argument that is a list of arrays sorted by descending size. Implementations may use these buffers as working memory instead of making allocations, and they may even use these buffers as return values. But they are not forced to use the buffers --- the user merely offers them as an option to the implementation, and the implementation can do whatever is most efficient.

This issue has the most detailed discussion, and there's a similar idea to your tmp there (called may_overwrite). JAX has a similar concept of buffers that may be reused, and names it donate_argnums.

My suggestion is to let routines take an optional argument that is a list of arrays sorted by descending size

Do you have much use for multiple buffers? My experience with out= is that while it's possible to pass a list/tuple of arrays, this is hardly ever used. Also, putting the onus on the user to sort by size seems unnecessary?

I would use a short name for this, specifically tmp. If it's a standard convention, I think it's worth the upfront cost of the less intuitive name, in order to get the extra conciseness everywhere else.

That argument does make sense I think - provided it indeed becomes a standard convention.

Not assigning the return value might happen to work, depending on the implementation you were calling into, but it wouldn't be correct to rely on it.

If we go this way, it would be nice to have some tooling available that verifies that temporary buffers do not get reused as variables. I'm a bit wary of leaving a footgun for users otherwise.

0 replies

honnibal · 2020-11-30T10:44:45Z

honnibal
Nov 30, 2020
Author

No worries about the delay. If you're anything like me, your github notifications are pretty flooded anyway, so it's really easy to miss stuff! I actually only just happened to think of this thread...

Do you have much use for multiple buffers? My experience with out= is that while it's possible to pass a list/tuple of arrays, this is hardly ever used.

I think we want the standard to offer enough efficiency support. If it doesn't offer enough, r 8000 outines will start getting tempted to introduce their own ad hoc mechanisms as well. I'm thinking that you want to be able to pass the tmp buffers through a chain of calls. An "operation" could be arbitrarily complex.

Also, putting the onus on the user to sort by size seems unnecessary?

I'm thinking that the tmp could be passed into successive calls, and each function would otherwise need to sort it. The part that's "special" is the top-most call from the user, so I think it's a good place to impose the work of ensuring the list is sorted. The guarantee will also make the list easy to use (you can check in constant time whether there's a buffer big enough for your needs, and get the smallest suitable buffer in log n time). If the list is unsorted, the user might have to worry that adding too many temporaries will actually make things slower, because it takes longer to iterate through them to check whether there's a suitable buffer.

If we go this way, it would be nice to have some tooling available that verifies that temporary buffers do not get reused as variables. I'm a bit wary of leaving a footgun for users otherwise.

Tooling's a good idea. How about a wrapper like this, that could be used in a test mode:

def no_temp(operation):
    def call_with_no_tmp(*args, **kwargs):
        if "tmp" in kwargs:
            kwargs.pop("tmp")
        return operation(*args, **kwargs)
    return call_with_no_tmp

The test mode would replace all the operations with this decorated version, to check that everything still works. This would ensure that the user's code is not relying on tmp buffer side-effects.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preliminary discussion: Standardise support for reuse of buffers #5

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Preliminary discussion: Standardise support for reuse of buffers #5

Uh oh!

honnibal Sep 21, 2020

Replies: 2 comments

Uh oh!

rgommers Nov 10, 2020 Maintainer

Uh oh!

Uh oh!

honnibal Nov 30, 2020 Author

honnibal
Sep 21, 2020

rgommers
Nov 10, 2020
Maintainer

honnibal
Nov 30, 2020
Author