-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
DLPack support for NumPy #19013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For other commenters: the discussion at data-apis/consortium-feedback#1 is quite lengthy but does include answers to many questions like object lifetime management. It is helpful to understand the context. Many of my concerns were addressed there. A side issue is the struct itself: the header mentions "[The data] pointer is always aligned to 256 bytes as in CUDA". If this is a hard requirement, NEP 49 may help achieve this in a zero-copy manner. I think this can be done separately from #18585. |
I have tried to look through the discussion. I still don't understand why the "consumed exactly once" is important (the PyCapsule is reference counted and can clean up on delete, so the memory management seems fine? – although maybe that is tricky without reference counts). But maybe there is something about streams or so that I just don't see right now. Would NumPy be able to just use the For the API, I think, I am still a bit confused about view vs. copy. If Especially, if the alignment of the memory allocation in NumPy can lead to a copy return instead of a view, that seems like it must at least be easy to check (I guess But, in general I think it just needs a good PR. Inn the end I don't really want to worry about it too much, if other projects already implemented basically the same thing (and it seems like they did). So long NumPy isn't the odd one out, due to being the only one that has to copy for reasons like alignment or byte-swapping. |
Exactly, it's not that important. It came from before we improved the Python API; the old way of doing it in other libraries was: # `x` is some array/tensor object
capsule = to_dlpack(x)
x2 = from_dlpack(capsule) At that point you had a (non-useful) PyCapsule object floating around in the users code, and then doing |
I think we can make this a keyword-only argument without breaking compat with the existing protocol, and default it to
I believe not, in that case the allocation still isn't "aligned" unfortunately.
For other commenters, this is due to the by-element strides of DLPack instead of the by-bytes. |
This is, indeed, a sticking point. I do not know how to resolve this. Perhaps this restriction could be clarified or removed if the rationale didn't apply to CPU data pointers. cc @tqchen would you happen to know the answer? |
Why would we want this? Semantics of a program using DLPack should not rely on view vs. copy (avoid in-place ops). The protocol tries to return a view for performance reasons, but it may not always be possible for all libraries - and then it will return a view. Having an explicit |
Usually, performance. Maybe one may want the flexibility of a different path when a view isn't possible, e.g., perform operation in the existing library rather than moving it to a new one using DLPack. Or, conversely, one may want to specify a copy because in-place ops are needed for an algorithm/function (I guess one can always do Of course, the default would be |
Performance is already optimal, it default to zero copy.
Exactly. That's about zero characters extra, so if you want to do something numpy-specific, use that. There are libraries that don't even have the concept of a view, and the buffer protocol and |
But that is not a stated requirement anywhere, I found yet? And apparently NumPy cannot always export zero copy (which is fine, but then we need an additional If the user wants to copy the data, they may have to do so twice, because they don't know whether the original array-like was copied already. Either because the exporter does so always, or because it does so "if necessary" like NumPy would apparently.
Could you explain why this such a strong preference that it doesn't matter to think about how you could allow both exports transparently? What about copy on write semantics, or just plain old numpy style usage?
Fair enough, it might help with memory management. It also might help a bit to ensure that only a single consumers will end up writing to the data. (I guess this doesn't relaly matter if we assume all exports are zero-copy, though.) |
Hmm, I'll have a look at where the most obvious place is to mention this. DLPack works just like the buffer protocol, except:
I don't think that is true. And I reviewed the implementations in other libraries like CuPy, MXNet, PyTorch and JAX a while back, and can't remember anything regarding the 256 byte alignment thing. There may be more exotic libraries that perhaps could be forced to make a copy, e.g. Vaex has a concept of "virtual columns" where data is basically a string expression rather than it living in memory. In that case, zero-copy access isn't possible. Another thing that can happen is that NumPy has, say, an array with odd strides. And then JAX and TensorFlow can't represent that, they only have contiguous arrays internally - hence they make a copy on the consumer side. But for our purposes, I think there isn't too much magic - just think of it as an enhanced buffer protocol. |
I hadn't noticed the comment (https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h#L130) before. It talks about CUDA and OpenCL, and the only mention in the issue tracker is dmlc/dlpack#1 (comment), which talks about vector types like PEP 3118 also briefly mentions alignment: " The default endian is |
Oh wait, this explains it (from issue description of data-apis/consortium-feedback#1):
/*! \brief The offset in bytes to the beginning pointer to data */
uint64_t byte_offset; That should solve the issue? |
I think that's more of a problem than a feature in the buffer protocol actually, and it reflects that it was written mostly from a "NumPy needs/design" point of view. Most libraries simply do not have a read-only array/tensor data structure, so if you create an array in such a library with |
The last point is good - I think |
DLPack is, by design, zero-copy. It's highly unusual for it to have to make a copy on the producer side, the Vaex virtual column is the only example I could think of. And if we add
Copy-on-write is nice, it's like the one thing I give Matlab a bit of credit for:) That's safe though, and stays within a library that would implement it - no keyword needed. Plain-old numpy style usage: could be done if you know you are working with 2 libraries that have the same semantics (we don't have any 100% the same, but PyTorch and MXNet are fairly close), and you have full control over the code you write. Then you already have all the tools you need though, again no keyword needed. You can go write ahead and mutate memory that's shared between two libraries. What I had in mind was library authors: if you don't know whether you're talking to JAX, PyTorch or library X, you simply cannot reliably mix views and in-place operations. |
Well, those were my main initial questions! What is the definite answer?
This seems just wrong. The buffer protocol has a deleter mechanism, and doesn't even use reference counting itself. Python puts the result into It is likely possible to extend the buffer protocol with a new "device" field (although maybe a bit clunky). It might even be possible to "backport" it to older Pythons. One thing where the two really differ (aside form device support and feature creap in the buffer protcol, and some standardized flags). From a protocol point of view, I think the only real difference is that the buffer protocol allows the requester to pass simple flags to the exporter.
Honestly, now I am just confused and a bit annoyed. How can such a simple flag be a problem, the consumer can easily check it after all and the exporter can always set readonly trivially? My points do not feel addressed at all unless you make the two requirements:
Those are some clear limitations. If that is the intention: OK, I don't care! But I do feel it strange to claim that the buffer protocol is bad because it doesn't have this limittion. Why are view semantics, in-place algorithms accepting a
Why? We do not have to support a As I also said, it would be completely sufficient to have a two new flags on the exported
It might be nice for the I don't argue for
Yes, but you are not addressing the questions/use case! If Matlab is copy-on-write, Matlab has to know that the provider (NumPy) may not be, and maybe just copy right away becuase it sees the exported buffer is "shared" and "writeable". The other way around, NumPy might want to know that it must not write to the Matlab data. It seems to me you are dissmissing use-cases because you have an ideal single Look, I do not care enough to fight over this: |
Another argument that is completely fine for me, is if we say: Well, DLPack may grow those features/use-cases in the future, and we will be able to even add a "requests" API in the future without problem. At that point, the only thing would need to be to clarify expectations. E.g. copy should not happen, but may. And whether or not its OK to write to a dlpack array, or we should e.g. default to a |
That was 5 comments up, as one line, followed by lots of discussion about adding a
I dunno, this is what PEP 3118 says as a "rejected idea": Having a "releaser" object whose release-buffer was called. This was deemed unacceptable because it caused the protocol to be asymmetric (you called release on something different than you "got" the buffer from). It also complicated the protocol without providing a real benefit.. If it does have a deleter mechanism now, maybe that's wrong - or I misunderstood, I haven't looked at the code for ages.
This was extensively discussed for about a decade I believe. No one has done it, it's a lot of work to write a new PEP, and I don't see how you would backport it. Putting things in Python itself just means you have to wait years after making any change before you can use it. It's also kind of a "proof is in the pudding": despite the buffer protocol being supported in the language itself, adoption is quite low. Unlike for DLPack, which is much newer.
As Hameer already pointed out,
Sebastian, seriously, you are slinging a ton of accusations as well as more API proposals. We have been working on this for months in the DLPack and array-api repos (it's a very much nontrivial topic), with review/contributions by quite a few people with extensive experience using all of DLPack, buffer protocol, and I'll just repeat for now: I don't think there are unsupported use cases, and hence I'm also not "dismissing" them. I just tried to answer questions as best I could. |
I believe there are some (admittedly rare) cases I see where NumPy must copy: The strides inside DLPack assume a unit of array elements, and those inside NumPy assume a unit of bytes, which is more flexible. One can find cases where zero-copy isn't possible, and one can see these if all elements of
So, yes, it's possible, rare IMO, and can be easily predicted by checking for But what I see as the more pressing matter I should clarify is the alignment: Should one just round down the pointer to the next 256 bytes and also add the "round down" amount to the offset? |
On my cell, so please forgive me for leaving only a drive-by comment: please ignore the alignment stuff. If we are talking about host-device copies, it's an implementation detail (of CUDA, HIP, or any sane memory pool implementation sitting on top). CUDA/HIP's copy APIs do not care about alignments. DLPack does not. CUDA Array Interface does not. No one does. Another thing is I seem to recall we agreed in a Array API meeting that if copy is needed, it's best for the user to handle it explicitly. I can try to look up a note on this later. |
Thanks for clearing that up Ralf and @leofang, sorry for missing the link to the comment where it was more clearly defined. I am not concerned about NumPy providing
@leofang: If a copy is never done, that is good. My concern is mainly that it should be predictable. Although, never might even be nicer. (The array api says it can return a view or copy, @rgommers, sorry if you feel I am just blocking your producitivity by making up moot issues and proposing changes to something that you have settled for yourself. Maybe I am late to the party, but maybe can I have a bit of time to think out loud? I am happy be pointed to a different place/issue, if you think this is a terrible place... To be clear, I am still just exploring limitations, and how those limitations could be removed. The And yes, I may have had the wrong impression that this API still could be modified more easily if new use-cases/problems come up, easier than the buffer-protocol at least. And just because I am discussing possible new API, doesn't mean I am trying to shut down adding the current as is. The limitations as use-cases, instead of "missing API" are:
Now you did dismiss a few of these as user-problems, or just not relevant enough (i.e. don't use DLPack if you write in-place algorithms or like writeable-view semantics). And the unnecessary copy is mitigated by Veax exposing a copy being uncommon, I guess? But at least some still seem like (small?) user traps to me. And Is adding one or two bits of information to DLPack on the table or are all of these small issues that we just want to live with? And with that, sorry for a way too long post... |
About the buffer protocol: You are right, I guess. The buffer protocol has only the "release function". And that could be enough to allow deleting the original object, but it is not specified to be a valid. (By passing a new, different "owner" object out during buffer creation. CPython's memoryviews would work with that happily, but NumPy doesn't.) So yes, unlike DLPack, the buffer protocol doesn't quite allow deleting the original exporting object. So maybe the "release" function is not a proper "delete" function in that sense. |
This is the thing - if you propose changes on this issue tracker, you're talking to others who then have to spend time responding to your proposals. It's different from "thinking out loud" (which I interpret as "to clarify things for yourself"). For me the appropriate places to take notes or clarify things for myself are a piece of paper, a HackMD doc, my own fork, etc. - and then ask clarifying questions first rather than make new proposals immediately. The bandwidth differential we have (you're full-time on NumPy, I'm struggling to find enough time) makes the impact of that worse. I think this is an important communication issue that comes up more often that I'd like. It'd be great if we could chat about it a little , maybe a call in the next few days - I'll ping you on Slack.
The DLPack maintainers are quite responsive, and significant changes have already been made to DLPack itself (complex dtypes support, stream support, Python API, renaming struct fields for clarity, etc.). More changes, provided they fit in the DLPack design and have good rationales, are very welcome I'm sure. That said, it'd be great to get some experience with at least a prototype of the current design first - proposing changes not based on any experience with the protocol seems a little odd. |
As annoying as it is to be on the receiving end of that, I have found that it has always been the case that the fault was mine, as the proposer. It's not your audience's fault that they weren't following all of your deliberations in real-time that would have shown how you worked out their problem cases. If it's not clearly laid out in the final proposal, it's a fair question. A ton of the PRNG design work went on in the default Every new audience is a new discussion. You can short-circuit that discussion by drafting the proposal expansively and clearly. For whatever benefits the DLPack maintainers have gotten by keeping their RFCs strictly in Github issues, it does mean that there's no such document for you to use to forestall questions. |
Thanks @rkern, that's all very fair. NEP 47, the array API docs and docs in the DLPack can all certainly be improved.
Definitely, please ask those questions. Ask to see a PR first so you can play with it. And point out which are the important doc gaps to fill. It's simply kind of hard if that's mostly skipped over and we jump straight to an API extension proposal. |
Agreed. There can be multiple reasons for the disconnect (the proposal isn't what you think it is, there's a disagreement about what the consequences of the proposal actually are, or there's a disagreement about whether the consequences are desirable). A counter-proposal is only appropriate once the disconnect is revealed, if not resolved. AFAICT, there's nothing that prevents a prototype from being made outside of numpy first. There might just be a little wrapper object in the way until support lands in a release of numpy. x = other_array.from_dlpack(numpy_dlpack.to_dlpack(some_ndarray))
y = numpy_dlpack.from_dlpack(some_other_array) Such a prototype package would be useful in production in order to support older versions of numpy, regardless. It's not going to solve all use cases (you won't be able to write "NEP 47-generic" functions with it), but a number of use cases involving explicit conversions can be handled. A small third-party prototype will make the discussions concrete and even premature counter-proposals more tolerable to discuss. |
First, replying to myself:
I think it's clear that host-device transfer should raise in NumPy's case. Now, let me move to Robert's above reply:
Can we please not do this outside NumPy? Two reasons:
|
No, I saw that. But it's not essential for a prototype that can help work out some of the other details. I'm not saying you should maintain it forever (though it could be useful for working with current versions of numpy). It's noticeably easier to build and evaluate a small prototype package than it is to build and evaluate a branch of numpy, especially if what I want to test is how it relates to other packages, which have their own dependencies on versions of numpy. |
I've only skimmed the conversation here, so apologies in advance if I'm repeating things #19013 (comment) says
Are there technical reasons that we can't go through a PEP3118 some_buffer = memoryview(some_ndarray)
x = other_array.from_dlpack(pep3118_dlpack.to_dlpack(some_buffer))
y_buffer = pep3118_dlpack.from_dlpack(some_other_array)
y = np.array(y_buffer) That is, could a |
@eric-wieser I believe the answer is in #19013 (comment),
|
My questions is more along the lines of "is it possible to 'smuggle' simple |
The feature set is a bit disjunct: the buffer-protocol supports much more, but doesn't support devices, but I don't think that is what you mean? If there was a I am not sure the indirection is useful enough? Hopefully DLPack is just so simple that it doesn't matter too much? Effectively, NumPy would/could be exactly that library (although not the most elegant implementation)! As far as I can tell NumPy supports the full set of features that is supported by both DLPack and Memoryview. |
Uh oh!
There was an error while loading. Please reload this page.
Feature
Motivation
Currently, any library which needs to exchange data with NumPy needs to have it as a dependency. This issue will allow moving away from that approach to a more Pythonic standards-based approach leveraging the DLPack Library, as described in the Array API standard. This is also mentioned in and a prerequisite for NEP 47, although it can be discussed and integrated independently of it as well, the only caveat being that if the NEP is accepted; adopting DLPack is a given.
DLPack is a small C header-only library with a stable ABI.
Changes needed to NumPy
The
numpy.ndarray
type will need to gain two new methods:__dlpack__
: This will return aPyCapsule
with a DLPack struct.__dlpack_device__
: This one will always return the CPU device for NumPy.And the NumPy namespace will gain one extra function:
from_dlpack
: This will consume aPyCapsule
containing a DLPack struct and create anumpy.ndarray
based on that. It will raise aRuntimeError
on all unsupported configurations of the object.Relevant issues/discussion:
Edit: Previously this issue said
from_dlpack
was a method, that has been corrected.cc @rgommers @mattip
The text was updated successfully, but these errors were encountered: