8000 BUG: DLPack cannot export readonly buffers and raises wrong exception · Issue #20742 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: DLPack cannot export readonly buffers and raises wrong exception #20742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leofang opened this issue Jan 5, 2022 · 30 comments · Fixed by #22542
Closed

BUG: DLPack cannot export readonly buffers and raises wrong exception #20742

leofang opened this issue Jan 5, 2022 · 30 comments · Fixed by #22542

Comments

@leofang
Copy link
Contributor
leofang commented Jan 5, 2022

Describe the issue:

The root issue is currently DLPack does not support exporting readonly arrays:

if ( !(PyArray_FLAGS(self) & NPY_ARRAY_WRITEABLE)) {
PyErr_SetString(PyExc_TypeError, "NumPy currently only supports "
"dlpack for writeable arrays");
return NULL;
}

which prevents its usefulness from situations where downstream consumers only need to read it or wrap it in the same readonly fashion (ex: JAX, MPI). IMHO NumPy should not make this abrupt decision for consumers, but instead just raises a warning.

But, for short-term solution before we converge, it's more important that a BufferError, instead of TypeError, is raised, in order to be aligned with the Python buffer protocol. In mpi4py, we do not anticipate TypeError and it causes this test to fail in our CI, which is a regression:
https://github.com/mpi4py/mpi4py/blob/02a6f31f6cc4a5779c6414b993a714caa2bf93c6/test/test_msgspec.py#L399-L404

cc: @dalcinl @mattip

Reproduce the code example:

import numpy as np

a = np.arange(10)
a.flags.writeable = False
b = np._from_dlpack(a)

Error message:

https://github.com/mpi4py/mpi4py-testing/runs/4676901782?check_suite_focus=true

======================================================================
1452
ERROR: testNotWriteable (test_msgspec.TestMessageSimpleNumPy)
1453
----------------------------------------------------------------------
1454
Traceback (most recent call last):
1455
  File "mpi4py/MPI/asbuffer.pxi", line 60, in mpi4py.MPI.PyMPI_GetBuffer
1456
ValueError: buffer source array is read-only
1457

1458
During handling of the above exception, another exception occurred:
1459

1460
Traceback (most recent call last):
1461
  File "/home/runner/work/mpi4py-testing/mpi4py-testing/mpi4py/test/test_msgspec.py", line 403, in testNotWriteable
1462
    self.assertRaises((BufferError, ValueError),
1463
  File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/unittest/case.py", line 739, in assertRaises
1464
    return context.handle('assertRaises', args, kwargs)
1465
  File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/unittest/case.py", line 201, in handle
1466
    callable_obj(*args, **kwargs)
1467
  File "/home/runner/work/mpi4py-testing/mpi4py-testing/mpi4py/test/test_msgspec.py", line 133, in Sendrecv
1468
    MPI.COMM_SELF.Sendrecv(sendbuf=smsg, dest=0,   sendtag=0,
1469
  File "mpi4py/MPI/Comm.pyx", line 328, in mpi4py.MPI.Comm.Sendrecv
1470
  File "mpi4py/MPI/msgbuffer.pxi", line 447, in mpi4py.MPI.message_p2p_recv
1471
  File "mpi4py/MPI/msgbuffer.pxi", line 433, in mpi4py.MPI._p_msg_p2p.for_recv
1472
  File "mpi4py/MPI/Comm.pyx", line 199, in mpi4py.MPI.Comm.Split
1473
  File "mpi4py/MPI/msgbuffer.pxi", line 138, in mpi4py.MPI.message_basic
1474
  File "mpi4py/MPI/asbuffer.pxi", line 264, in mpi4py.MPI.getbuffer
1475
  File "mpi4py/MPI/asbuffer.pxi", line 64, in mpi4py.MPI.PyMPI_GetBuffer
1476
  File "mpi4py/MPI/asbuffer.pxi", line 62, in mpi4py.MPI.PyMPI_GetBuffer
1477
  File "mpi4py/MPI/asdlpack.pxi", line 193, in mpi4py.MPI.Py_GetDLPackBuffer
1478
TypeError: NumPy currently only supports dlpack for writeable arrays

NumPy/Python version information:

NumPy 1.22.0

@seberg
Copy link
Member
seberg commented Jan 5, 2022

Could you first standardize that a BufferError must be raised in the __dlpack__ specs? The buffer protocol does this.

@leofang
Copy link
Contributor Author
leofang commented Jan 5, 2022

@seberg I have two thoughts. First, I do like it for better clarity. However I am opposed to adding new spec to DLPack at this point because

  1. the Array API v2021 is concluding and other libraries (including NumPy itself) have implemented it
  2. it's a NumPy-only thing that is not seen handled elsewhere (CuPy, PyTorch, mpi4py, etc). To me this is simply an improper implementation decision that slipped through my reviews during ENH: Implement the DLPack Array API protocols for ndarray. #19083
  3. If we decide to support readonly arrays in NumPy (again, CuPy/PyTorch does not have the readonly concept, only NumPy/JAX) the above discussion is moot, so why bother wasting our time?

@seberg
Copy link
Member
seberg commented Jan 5, 2022

the Array API v2021 is concluding and other libraries (including NumPy itself) have implemented it

Include it for the next version? If this should raise a BufferError, what about the raise ValueError('Unknown dtype') that cupy raises for unsupported dtypes? Fixing it in the SPEC will clear that up.

it's a NumPy-only thing that is not seen handled elsewhere (CuPy, PyTorch, mpi4py, etc). To me this is simply an improper implementation decision that slipped through my reviews during

This is one of the points that __dlpack__ is fuzzy about, and in my biased opinion it did not have to be – I guess it is simply that a tricky problem was never solved or quite nailed down.

Now, implicit __dlpack__ consumption was removed since it is only included as from_dlpack which side-steps the issue mostly. But I doubt it is documented that __dlpack__ may return readonly data (i.e. that crashes or creates strange results when written to) and that this is a user problem, and that thus while array_api.asarray() is not documented to read __dlpack__ currently, other libraries may want to make sure to follow that choice.

If we decide to support readonly arrays in NumPy (again, CuPy/PyTorch does not have the readonly concept, only NumPy/JAX) the above discussion is moot, so why bother wasting our

Well, specifying that BufferError should be used when export fails is much more generally useful than just here anyway, so I don't see how it wastes time.
Of course, personally I still think this is an unnecessary design flaw in __dlpack__ that we do not have a clear machinery to allow us to discuss solutions for making this a consumer rather than an exporter problem is worthwhile; but instead being forced to just accept that this is a problem that end-users have to be aware of and that the NumPy path of being conservative (for now) is just a nuisance.

So, nailing these things down in the SPEC will go a huge way to making it not painful give my blessing – it will at the least make it clear that it isn't just ignored due to a solution being tricky. But you don't need my blessing to move this forward, so all good...

@leofang
Copy link
Contributor Author
leofang commented Jan 6, 2022

Include it for the next version? If this should raise a BufferError, what about the raise ValueError('Unknown dtype') that cupy raises for unsupported dtypes? Fixing it in the SPEC will clear that up.

Don't you think even ValueError makes better sense than TypeError? First, in the buffer protocol TypeError is only raised when an object does not provide buffer access, but here we are consuming a NumPy array which does provide DLPack access. Next, following NumPy's own convention it is common to raise ValueError when an argument does not meet the requirement of the routine.

This is one of the points that __dlpack__ is fuzzy about, and in my biased opinion it did not have to be – I guess it is simply that a tricky problem was never solved or quite nailed down.

Yes. Now that I look back, I think we better revisit @rgommers's data-apis/array-api#191 (comment) in the linked issue. Apparently NumPy picked one of his suggestions (raise an exception) but it led to this very issue.

Now, implicit __dlpack__ consumption was removed since it is only included as from_dlpack which side-steps the issue mostly. But I doubt it is documented that __dlpack__ may return readonly data (i.e. that crashes or creates strange results when written to) and that this is a user problem, and that thus while array_api.asarray() is not documented to read __dlpack__ currently, other libraries may want to make sure to follow that choice.

Sorry, I am not sure I follow here. What does it have to do with [array_api.]asarray()?

Well, specifying that BufferError should be used when export fails is much more generally useful than just here anyway, so I don't see how it wastes time. Of course, personally I still think this is an unnecessary design flaw in __dlpack__ that we do not have a clear machinery to allow us to discuss solutions for making this a consumer rather than an exporter problem is worthwhile; but instead being forced to just accept that this is a problem that end-users have to be aware of and that the NumPy path of being conservative (for now) is just a nuisance.

I now think that a DLPack exporter should deliver the readonly flag and let the consumer decide what to do with it, instead of shutting down the door right away. Let me see if we could get this discussed a bit in the Array API meeting tomorrow.

So, nailing these things down in the SPEC will go a huge way to making it not painful give my blessing – it will at the least make it clear that it isn't just ignored due to a solution being tricky. But you don't need my blessing to move this forward, so all good...

Don't give up, Sebastian! I am sure we'll figure it out 😂

@seberg
Copy link
Member
seberg commented Jan 6, 2022

Don't you think even ValueError makes better sense than TypeError?

Maybe, but if you expect NumPy to raise a BufferError when export as DLPack is impossible (for this reason) shouldn't you also expect cupy to do the same when export is impossible because the buffer has a dtype not support by DLPack?

In mpi4py you can rely on the BufferError for the buffer-protocol, because it is clearly specified. Specifying it for __dlpack__ will make sure you can rely on it here as well, since NumPy may not be the only one making this choice.

@leofang
Copy link
Contributor Author
leofang commented Jan 6, 2022

Maybe, but if you expect NumPy to raise a BufferError when export as DLPack is impossible (for this reason) shouldn't you also expect cupy to do the same when export is impossible because the buffer has a dtype not support by DLPack?

If NumPy thinks it is a better choice and jumps right on it, I can convince the team to follow. We picked this choice way way way before NumPy had it or Array API started, but I am sure we would like to align with NumPy if this is the ultimate decision. cc: @kmaehashi

@seberg
Copy link
Member
seberg commented Jan 6, 2022

If NumPy thinks it is a better choice and jumps right on it, I can convince the team to follow.

To be clear, I did not think about whether ValueError would be more reasonable than a TypeError, and BufferError seems very much a great choice, and I am happy to put a PR for that in.

My whole point is that the fact that cupy, et. al. also use something that isn't a BufferError for a similar purpose, seems like a decent argument that nailing down a choice in the spec may be worthwhile, unless you say this is not really a bug, but just a small tweak to help out mpi4py.

EDIT: Or in other words: NumPy should not be the source of truth for cupy here. The spec should be the source of truth for both.

@leofang
Copy link
Contributor Author
leofang commented Jan 6, 2022

NumPy should not be the source of truth for cupy here. The spec should be the source of truth for both.

For most cases NumPy is the source of truth for us (CuPy). This case is special only because

  1. CuPy (and other DL frameworks) already had from_dlpack from long long ago
  2. CuPy added this routine to the main namespace
  3. This routine is adopted (and revised) by the Array API so now it also appears in the xp.array_api namespace

so yes we probably have some wiggle room here.

For mpi4py, @dalcinl probably would have time to jump in later this week, but as I said in the very beginning the choice of exception type is just a short-term workaround; the best scenario is NumPy also exports the readonly arrays (see #20742 (comment), for example). Again, if NumPy (and other libraries that support the readonly flag) does this, the discussion is moot.

@rgommers
Copy link
Member
rgommers commented Jan 6, 2022

Interesting discussion, and yes there's more work to do. Just a note that we've just arranged an internship to work on DLPack support for 3 months starting end of January - it's high on my wish list to work on docs, rough edges and extensions in DLPack itself.

My $2c on the decision here: NumPy has zero users for this functionality yet, we released support a few days ago. So let's adjust NumPy to CuPy et al. where that make sense, and not ask CuPy to make backwards incompatible changes unless it means ending up with significantly improved behaviour.

@seberg
Copy link
Member
seberg commented Jan 6, 2022

So let's adjust NumPy to CuPy et al. where that make sense

All I would like is to have a clear basis/decision. Obviously NumPy can change everything easily, but if a BufferError is better, then why not use it? Why not force CuPy, etc. to make that incompatible change?

Right now:

torch.BoolTensor([False]).__dlpack__()
# RuntimeError: Bool type is not supported by dlpack

Where cupy will give the better ValueError? If raising an error that a consumer can rely on when export fails is not possible due to protocol limitations is desirable, IMO that must be specified in the SPEC. BufferError seems obviously a good choice, so making the choice to break cupy here seems possibly a good idea (although not necessary).

@leofang
Copy link
Contributor Author
leofang commented Jan 7, 2022

Interesting discussion, and yes there's more work to do. Just a note that we've just arranged an internship to work on DLPack support for 3 months starting end of January - it's high on my wish list to work on docs, rough edges and extensions in DLPack itself.

To follow up: In the Array API meeting yesterday it's decided that the intern will help us tie up the loose ends -- revising the DLPack spec if needed -- so that this and several other issues can be addressed properly. Spoiler alert: We'll likely want to export a readonly tensor anyway and find a way to let the consumer decide how to deal with it. Before that happens, let's keep this issue open and not rush into any change.

@seberg
Copy link
Member
seberg commented Jan 7, 2022

Great :). I am not sure I think this is an intern-sized problem, but if you got a good intern who is interested in figuring this out it hopefully can be. (I still think more of this work clearly should have been done ahead of time, and things would have been much easier and better now.)

Independently though, a decision to have a BufferError (and a PR) here would be nice if you want it to be backported for 1.22.1.

@leofang
Copy link
Contributor Author
leofang commented Jan 7, 2022

What's the cutoff date for 1.22.1?

@seberg
Copy link
Member
seberg commented Jan 7, 2022

Dunno, lets assume there will be one pretty soon (there are some bugs that need fixing, but also still some open ones so best case 1-2 weeks I would say). But 1.22.2 is likely not far off anyway after that, so it probably mainly matters how quick you want a fix.

@leofang
Copy link
Contributor Author
leofang commented Jan 9, 2022

Thanks, Sebastian. One more quick question:

Independently though, a decision to have a BufferError (and a PR) here would be nice if you want it to be backported for 1.22.1.

Would this suggestion necessarily involve revising the DLPack spec as you suggested earlier?

@seberg
Copy link
Member
seberg commented Jan 9, 2022

Look, I just feel that when we change it nobody will want to change it again. So if NumPy+mpi4py agree on this it will either standardize this effectively or give us headaches later if something else gets standardized. I do not think that standardizing anything is my decision to make, though.

@seberg
Copy link
Member
seberg commented May 4, 2022

Pushing this off, but if we want to change the exception, that is possible, there is still time until the release.

@ogrisel
Copy link
Contributor
ogrisel commented Sep 5, 2022

Irrespective of the type of exception raised, the fact that non-writeable arrays cannot be exported with __dlpack__ is preventing the regular usage of the __dataframe__ protocol in pandas as documented in:

The dataframe interchange protocol itself is being drafted here:

and it explicitly relies on __dlpack__ to map the column buffers:

@seberg
Copy link
Member
seberg commented Sep 5, 2022

Unfortunately, NumPy cannot fix this. This requires a push from the dlpack side and I am not sure how to do that. Maybe you can also comment there? The main issue is probably dmlc/dlpack#104

I persoally do think DLPack not extending its ABI is quickly becoming a critical issue for any further adoption, and it needs to be cleared up ASAP.

@rgommers
Copy link
Member
rgommers commented Sep 5, 2022

Unfortunately, NumPy cannot fix this.

Minor correction: we can in principle do this in NumPy, just like JAX is happily exporting its read-only arrays. We just have decided to not do it when merging NumPy's DLPack support, because it was a sticking point and there was no consensus on doing it the same way as JAX. Fully agreed that it would be better to extend DLPack, but it's turned out to be quite hard to move forward (and not for lack of effort by @tirthasheshpatel and @mattip). So for now it looks like we've just parked the issue until there is renewed effort (or more urgent need) to push this forward.

It seems like this particular issue that @ogrisel is running into will be straightforward to resolve on the Pandas side though.

@dalcinl
Copy link
dalcinl commented Sep 5, 2022

@rgommers What about supporting the thing if some environment variable is set? Previous practice would support my proposal. Additional mechanisms could be added, but an environment variable is a good start, and much better than nothing.

@seberg
Copy link
Member
seberg commented Sep 5, 2022

@dalcinl we can add an env variable as a stop-gap for experimentation and testing. But it is not a long term solution IMO.
I am not a fan of the unsafe export of buffers that are either "immutable" (should not be changed) or truly readonly (must not be changed) and clearing that up in dlpack should be a priority in my opinion.

So, we can provide work-arounds, we could even do the unsafe export. But on the unsafe export I would need some convincing and input from a wider audience that they feel that is OK.

@dalcinl
Copy link
dalcinl commented Sep 5, 2022

@seberg I'm 100% on your side regarding the need to find a proper solution. My environ variable proposal is just a way to strength NumPy's statement of need and provide some alternative to nothing/nichts/nada/niente/rien. With an environment variable or any other alternative opt-in mechanism, it is ultimately the end-user who decides whether to activate the feature and takes responsibility for the outcome. A little extra freedom for these poor souls cannot harm them that much 😄, right?

@rgommers
Copy link
Member
rgommers commented Sep 5, 2022

I'd prefer to start doing the export unconditionally over an env var, but if the env var is the one thing on the table without a long discussion now, I think I'm okay with that.

One issue with env vars is that they stick around for too long. Case in point: I thought we had gotten rid of NUMPY_EXPERIMENTAL_ARRAY_FUNCTION - fairly sure that was the plan, since it's certainly not experimental anymore. I think if we add one now, we can only remove it once we either have DLPack support for a readonly flag, or we have moved to unconditionally exporting. It looks like we want one of those things, so perhaps it's okay.

Maybe we should not document the env var, so that people who come look for it comment here with a "hey I needed this as well"?

@dalcinl
Copy link
dalcinl commented Sep 5, 2022

I think if we add one now, we can only remove it once we either have DLPack support for a readonly flag, or we have moved to unconditionally exporting.

Contrary to NUMPY_EXPERIMENTAL_ARRAY_FUNCTION, once (and if ever) readonly export become default (either by proper DLPack support or unconditionally exporting), I hardly see the need to keep the env var to disable the export.

Maybe we should not document the env var, so that people who come look for it comment here with a "hey I needed this as well"?

Not documenting the env var may be too much. NumPy could advertise it, declare it provisional, warn that its support and/or behavior may change at any time. And then perhaps point to a polling place (GH issue?) where people can upvote for "I need this" to get lobbying power. I believe setting the env var (and having to remember about it) to get things working will be annoying enough to guarantee the votes you want.

PS: Sorry, I'm a bit confused. What's the root issue here? Is it lack of agreement on how to extend DLPack? Is it lack of contributors to implement the new features? Or is it DLPack's rejection to extend and evolve the standard?

@rgommers
Copy link
Member
rgommers commented Sep 5, 2022

What's the root issue here? Is it lack of agreement on how to extend DLPack? Is it lack of contributors to implement the new features? Or is it DLPack's rejection to extend and evolve the standard?

The main issue is that DLPack is a few C structs that are currently not extendable. So an ABI break is needed to make it extendable, add versioning, etc. Doing so in a way that works for both Python and C/C++ users and is not overly disruptive to the current user base is highly nontrivial.

Significant work has been put into proposals, but (a) the maintainers of DLPack have quite limited bandwidth, and (b) they are reluctant to now work on this for what is at least partially "future-proofing". The need for adding alignment info is probably the most compelling argument right now, a readonly flag is nice to have but not all that compelling by itself.

@leofang
Copy link
Contributor Author
leofang commented Sep 6, 2022

Significant work has been put into proposals, but (a) the maintainers of DLPack have quite limited bandwidth, and (b) they are reluctant to now work on this for what is at least partially "future-proofing".

Sorry, I don't get this. Why do we need DLPack maintainers to spare any bandwidth (other than clicking "merge") if we can find people to contribute (like what Tirth was doing)? I am not a fan of a community infrastructure being blocked because of this. To me this seems to be suggesting DLPack was not the most ideal choice if it can't evolve as needed.

@rgommers
Copy link
Member
rgommers commented Sep 6, 2022

I think it's a bit more complicated than that. Breaking the ABI at all is a big decision, and it'd need a very carefully documented and communicated strategy. If someone wanted to do that for NumPy, with a ready-to-go PR including some docs and a version bump, I'm fairly sure we'd not hit the merge button just like that:)

@leofang
Copy link
Contributor Author
leofang commented Sep 7, 2022

I was perhaps a bit exaggerating. I didn't mean it's a simple task or anything like that. I already recognized it's a community infrastructure and naturally any change would involve reaching a consensus of the majority of community maintainers.

What I meant is if someone can lead the said effort, it should not take too much bandwidth of any DLPack maintainers. What's more important is the green checks from downstream (more is better). So I disagree that the burden is on Tianqi et al (again, under the assumption that if someone can work on this).

@seberg seberg modified the milestones: 1.24.0 release, 1.25.0 release Oct 20, 2022
@seberg
Copy link
Member
seberg commented Oct 20, 2022

Pushing off the milestone, probably could also just close the issue: Right now I assume DLPack will recieve the update and fix this eventually. If anyone wants to look into a NumPy side work-around, that might be a different issue.

Although, could still change it to reliably give BufferError for all errors that mean that the array is incompatible with DLPack.

EDIT: Leaving milestone for the possible error change only for now, do not hesitate to move it.

@seberg seberg modified the milestones: 1.25.0 release, 1.24.0 release Oct 20, 2022
seberg added a commit to seberg/array-api that referenced this issue Oct 20, 2022
I would actually like MUST, but since most implementations currently don't
actually give a ``BufferError``, writing it as SHOULD instead.

First part of closing numpy/numpy#20742
seberg added a commit to seberg/numpy that referenced this issue Nov 7, 2022
See also data-apis/array-api#498.

I think we should just change this.  It is a niche feature and just
an error type change.

Closes numpygh-20742
seberg added a commit to seberg/numpy that referenced this issue Nov 7, 2022
See also data-apis/array-api#498.

I think we should just change this.  It is a niche feature and just
an error type change.

Closes numpygh-20742
kgryte added a commit to data-apis/array-api that referenced this issue Nov 14, 2022
* API: Specify that DLPack should use BufferError

I would actually like MUST, but since most implementations currently don't
actually give a ``BufferError``, writing it as SHOULD instead.

First part of closing numpy/numpy#20742

* Fix duplicated word

Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
Co-authored-by: Athan <kgryte@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
0