Replies: 2 comments
-
Thanks @honnibal for the thoughts, and apologies for the long delay in replying (the second half of Sep was a bit of a train wreck that I only just recovered from).
Completely agree. After you brought that up on Twitter I already had multiple discussions about this topic with people. We just published the array API standard for review, and mutability was probably the single most complex topic to get to a sensible solution for. We did leave out
This issue has the most detailed discussion, and there's a similar idea to your
Do you have much use for multiple buffers? My experience with
That argument does make sense I think - provided it indeed becomes a standard convention.
If we go this way, it would be nice to have some tooling available that verifies that temporary buffers do not get reused as variables. I'm a bit wary of leaving a footgun for users otherwise. |
Beta Was this translation helpful? Give feedback.
-
No worries about the delay. If you're anything like me, your github notifications are pretty flooded anyway, so it's really easy to miss stuff! I actually only just happened to think of this thread...
I think we want the standard to offer enough efficiency support. If it doesn't offer enough, r
8000
outines will start getting tempted to introduce their own ad hoc mechanisms as well. I'm thinking that you want to be able to pass the
I'm thinking that the
Tooling's a good idea. How about a wrapper like this, that could be used in a test mode:
The test mode would replace all the operations with this decorated version, to check that everything still works. This would ensure that the user's code is not relying on tmp buffer side-effects. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
(These thoughts are not organised enough for an RFC, and I'm not sure whether the topic is covered elsewhere)
Most numpy routines support an argument
out
, into which the output of the operation should be stored. This is primarily intended to reduce data allocation and copying. However, this API design has problems in both its generality and its flexibility.Whenever we're offering an affordance in the API that aims to let the user help us be more efficient, we should take care that the affordance won't force an alternate implementation to be less efficient instead. The
out
argument falls into exactly this problem.Let's say I'm using a numpy-like API, and I'm in a position where I could pass the routine an
out
buffer. I'm indifferent about this semantically, I just want to help it be efficient. So I pass theout
buffer. But now the routine is forced to use myout
buffer. If it used a temporary buffer to compute its result, it might have to make an extra copy.The single
out
argument is also lacking in flexibility. I might have a routine that needs to allocate a large temporary buffer to do its work. If the user has this buffer handy (for instance, if they're working in a loop), they should have the chance to pass it in. But I'll need to come up with an ad hoc argument for this, and the user won't necessarily expect that the option is available.My suggestion is to let routines take an optional argument that is a list of arrays sorted by descending size. Implementations may use these buffers as working memory instead of making allocations, and they may even use these buffers as return values. But they are not forced to use the buffers --- the user merely offers them as an option to the implementation, and the implementation can do whatever is most efficient.
I would use a short name for this, specifically
tmp
. If it's a standard convention, I think it's worth the upfront cost of the less intuitive name, in order to get the extra conciseness everywhere else.With numpy's
out
you can write either of the following:With
tmp
, you would write:Not assigning the return value might happen to work, depending on the implementation you were calling into, but it wouldn't be correct to rely on it.
Beta Was this translation helpful? Give feedback.
All reactions