8000 ENH: Throw TypeError on operator concat on Numpy Arrays by vrakesh · Pull Request #16570 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Throw TypeError on operator concat on Numpy Arrays #16570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 27, 2020

Conversation

vrakesh
Copy link
Member
@vrakesh vrakesh commented Jun 10, 2020

Fixes #16489

New to Python C-API, requesting feedback on Reference counter handling, does it look correct?

Tried to reuse Numpy concatenate.

@vrakesh vrakesh requested review from seberg and eric-wieser June 10, 2020 23:33
@vrakesh vrakesh changed the title WIP:Add support sq_concat on ndarrays WIP:Add support for sq_concat on ndarrays Jun 11, 2020
@vrakesh
Copy link
Member Author
vrakesh commented Jun 11, 2020

Follow up question, should we allow concat with regular arrays like np.concatenate?

operator.concat does not allow concat of two types . Should we keep the numpy way or follow operator.concat in this specific instance?

@eric-wieser
Copy link
Member

PyPy failure is either a (possibly unfixable) bug in PyPy, or a design problem in python itself.

@vrakesh
Copy link
Member Author
vrakesh commented Jun 11, 2020

@eric-wieser Thanks a lot for the feedback, made the changes. Quick question, should we support coercion of array like types to NDarray or follow operator.concat for this case?

@vrakesh vrakesh changed the title WIP:Add support for sq_concat on ndarrays ENH:Add support for sq_concat on ndarrays Jun 11, 2020
@mattip
Copy link
Member
mattip commented Jun 11, 2020

The test shows an inconsistancy in Python. operator.concat is documented as "Return a + b for a and b sequences." PyPy will prefer nb_add, and only if it is NULL try sq_concat. CPython seems to do something different, This PR changes the existing results for operator.concatenate and in my opinion is incorrect. Could you provide the motivation for the change?

@seberg
Copy link
Member
seberg commented Jun 11, 2020

@mattip the C-Python implementation is roughly this:

if sq_concat is not NULL:
    return sq_concat()

if isinstance(obj1, sequence) and isinstance(obj2, sequence):
    return obj1 + obj2

I think this is correct at least in the sense that the current behaviour is undesired enough that it makes sense to do something. I would be happy, however, to implement this simply as:

def sq_concat(self, other):
    raise TypeError("NumPy arrays cannot be concatenated using `operator.concat` use `np.concatenate` instead.")

if this does not work in PyPy. I agree that this is all besides the standard, but currently operator.concat(a, b) returns a + b, which is also not great?

@mattip
Copy link
Member
mattip commented Jun 11, 2020

According to the documentation, a+b is exactly what operator.concat should be returning, no? Could you point to documentation of operator.concat as your pseudo-code? I don't debate that is what CPython does do, just that is not what the Python documentation says operator.concat should do.

@mattip
Copy link
Member
mattip commented Jun 11, 2020

The CPython implementation of operator.concat calls PySequence_Concat, which is documented as

Return the concatenation of o1 and o2 on success, and NULL on failure. This is the equivalent of the Python expression o1 + o2.

Inside PySequence_Concat is this comment

       /* Instances of user classes defining an __add__() method only
       have an nb_add slot, not an sq_concat slot.      So we fall back
       to nb_add if both arguments appear to be sequences. */

so it seems a situation where both nb_add and sq_concat are set is not allowed. I am interested why @vrakesh proposed this PR @eric-wieser opened this issue: is there a use-case for changing the semantics of operator.concat on ndarrays?

Edit: @eric-wieser opened the issue.

@seberg
Copy link
Member
seberg commented Jun 11, 2020

Well, I agree with both sides. Yes, my pseudo-code is what it does, and it implements it as a+b for sequences... Now we have the problem that numpy arrays are almost, but not really, sequences.

So, I see the argument that we should maybe not try to do "magic" to make this work. But on the other hand, I am not sure there is a reason why not to raise an error? Since it is guaranteed that any user who does operator.concat(arr, arr) is not getting what they expect.

@eric-wieser noted the possibility when torch.concat came up as a different spelling for concatenate. And to me also it seemed strange, but OK to define on first sight (i.e. the current behaviour is just bad, so even something a bit hackish seemed OK to me).

@anirudh2290
Copy link
Member

giving it some thought, @mattip makes a good point, looks like it should call nb_add, which in the case of python, which when we pass numpy arrays means broadcast and do an elemwise add.

but yes very few users would expect that operator.concat would cause a broadcast followed by elemwise add and an error will help. not sure, if we go down this road though, how many more such difference in behaviors (because of numpy array being slightly different from python seq) we will have to fix.

@eric-wieser
Copy link
Member

Perhaps we should file a python issue asking for clarification.

@eric-wieser
Copy link
Member
eric-wieser commented Jun 11, 2020

In https://bugs.python.org/issue29139#msg284547, @serhiy-storchaka says:

Third-party classes (maybe NumPy arrays, I don't know) can have different implementations of sq_concat and nb_add.

@seberg
Copy link
Member
seberg commented Jun 11, 2020

Honestly, I am unwilling to fight either way. If Python really embraced it, they would need to add __concat__, until then this is all feels like an implementation detail.
I would be happy to e.g. raise an error, just because I do not see it hurting anyone, but even the raised error would be an implementation detail itself (since PyPy may behave differently).

@eric-wieser
Copy link
Member
eric-wieser commented Jun 12, 2020

Just attempted to post to python-dev about this, but it looks like my email got swallowed by the mailing list... Maybe the archives are just slow.

https://mail.python.org/archives/list/python-dev@python.org/message/HCXMI7LPWZEKKPRIJTVHN5ZRWIUXM5C3/

@vrakesh
Copy link
Member Author
vrakesh commented Jun 12, 2020

Thanks @eric-wieser for the email, hope the clarification helps us move this in the right direction.

@seberg
Copy link
Member
seberg commented Jun 16, 2020

Seems like the (small) discussion on the python mailing list basically converged to: Its an implementation detail, but putting an error could be fine.
I will let @mattip make the call on what he prefers. The best I can offer would be make it a TypeError, with text like:

Calling sequence concatenation (operator.concat) on NumPy arrays invokes undefined behaviour. Please use np.concatenate instead. This error is an implementation detail and may not be raised on all Python versions/implementations, in which case incorrect results may be returned.

EDIT: We would not really be able to test it, or at least skip it on PyPy and any other python implementation which behaves differently

@vrakesh vrakesh changed the title ENH:Add support for sq_concat on ndarrays ENH: Throw TypeError on operator concat on Numpy Arrays Jun 19, 2020
@vrakesh
Copy link
Member Author
vrakesh commented Jun 22, 2020

@seberg @mattip Does this look good? Am I missing any specific changes related to PyPy?

// Throw a type error, when trying to concat NDArrays
if(PyArray_Check(other) || PyArray_Check(self)) {
PyErr_SetString(PyExc_TypeError,
"Concatenation operation is not implemented for Numpy Arrays");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth highlighting np.concatenate in the error message? Only as long as it doesn't make things too verbose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, makes sense to redirect

@seberg
Copy link
Member
seberg commented Jun 22, 2020

You have to skip the test on PyPy (there is an IS_PYPY variable). Also PyArray_Check(self) will always be true, so you don't even have to put it.

@vrakesh
Copy link
Member Author
vrakesh commented Jun 22, 2020

Thank you @seberg will make the changes.

@vrakesh vrakesh force-pushed the concat_sequence branch 3 times, most recently from c1edafe to dbfcc84 Compare June 27, 2020 00:37
@seberg seberg force-pushed the concat_sequence branch from dbfcc84 to 73b844f Compare June 27, 2020 15:32
Note that this is a Python implementation detail and *not* a language
standard. Users should not rely on the error being set, as it may not
be set e.g. on PyPy. (See numpygh-16570 for discussion, this has been
briefly discussed on Python as well:
https://mail.python.org/archives/list/python-dev@python.org/message/HCXMI7LPWZEKKPRIJTVHN5ZRWIUXM5C3/

Fixes numpygh-16489
@seberg seberg force-pushed the concat_sequence branch from 73b844f to 8ac7ac9 Compare June 27, 2020 15:36
@seberg
Copy link
Member
seberg commented Jun 27, 2020

Thanks Rakesh!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

operator.concat does not concatenate ndarrays
7 participants
0