8000 ValueError: invalid __array_struct__ when assigning to object-dtype ndarray · Issue #16939 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ValueError: invalid __array_struct__ when assigning to object-dtype ndarray #16939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jul 24, 2020 · 19 comments · Fixed by #16943
Closed

ValueError: invalid __array_struct__ when assigning to object-dtype ndarray #16939

TomAugspurger opened this issue Jul 24, 2020 · 19 comments · Fixed by #16943

Comments

@TomAugspurger
Copy link
TomAugspurger commented Jul 24, 2020

Hi. Pandas is broken with numpy master :)

Reproducing code example:

In [1]: import numpy as np
In [2]: out = np.empty(3, dtype="object")
In [3]: out[:] = [np.int64, np.int64, np.int64]

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-d20bd885ed9b> in <module>
----> 1 out[:] = [np.int64, np.int64, np.int64]

ValueError: invalid __array_struct__

Numpy/Python version information:

Installed from a wheel with pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy

1.20.0.dev0+2d12d0c 3.7.6 (default, Dec 30 2019, 19:38:28)
[Clang 11.0.0 (clang-1100.0.33.16)]

I don't immediately see what commit could have caused this. #16850 mentions object-dtype scalars, but doesn't look quite right.

@rgommers
Copy link
Member

The SciPy builds on TravisCI with NUMPYSPEC='--pre also just started failing (e.g. https://travis-ci.org/github/scipy/scipy/jobs/711362812, with stats broadcasting and test_methods_with_lists and TestDifferentialEvolutionSolver.test_args_tuple_is_passed failures), may be related.

@charris
Copy link
Member
charris commented Jul 24, 2020

Probably because I fixed the nightly cython related build failures and triggered new builds yesterday. The nightly builds had stalled out :)

@charris
Copy link
Member
charris commented Jul 24, 2020

@seberg Thoughts?

@seberg
Copy link
Member
seberg commented Jul 24, 2020

I am a bit surprised I did not notice additional scipy failures when comparing to master. The SciPy ones are reverse broadcasting, I thought I covered that, but apparently I missed something.

The problem is this type of code:

bounds     = [(-np.inf, np.inf), (np.array([2]), np.array([3]))]
np.array(bounds)

which only works because float(np.array([2]) happens to work for numpy arrays specifically. I honestly think those failures are better, but I have to see how we can go through a deprecation.

The pandas code with the dtypes, I am not sure about, may also be coercion related likely. It seems like we used to ignored errors here at some point and now I know that its necessary to support these classes as objects.

@seberg
Copy link
Member
seberg commented Jul 24, 2020

Ah, on the pandas issue, we currently have this, so that explains the change:

In [1]: np.array(np.int64)       
ValueError: invalid __array_struct__

they behave identical now, but went towards the error case.

@eric-wieser
Copy link
Member

Was just about to comment the same - this has always been a problem for scalars, it's just we made behavior consistent for lists of them too.

@eric-wieser
Copy link
Member

#8877 is very similar here, our problem is we use x.__array_struct__ when we ought to be using something like type(x).__array_struct__

@eric-wieser
Copy link
Member

An easy fix would be to ignore __array_struct__ if it is of type getset_descriptor.

8000
@seberg
Copy link
Member
seberg commented Jul 24, 2020

True, maybe that is the best solution. We may need to also let property pass, and I have to check if there was a similar change for __array_interface__.

@eric-wieser
Copy link
Member

Perhaps checking for tp_get is sufficient?

@seberg
Copy link
Member
seberg commented Jul 24, 2020

Hmmm, I guess so, although it would also be nice not to rely on inspecting slots? The closest thing I could find in python are these from inspect:

def ismethoddescriptor(object):
    """<snip>""""
    if isclass(object) or ismethod(object) or isfunction(object):
        # mutual exclusion
        return False
    tp = type(object)
    return hasattr(tp, "__get__") and not hasattr(tp, "__set__")

def isdatadescriptor(object):
    """<snip>"""
    if isclass(object) or ismethod(object) or isfunction(object):
        # mutual exclusion
        return False
    tp = type(object)
    return hasattr(tp, "__set__") or hasattr(tp, "__delete__")

Interestingly, it seems that tp_descr_get is always set to slot_tp_descr_get (for python defined types), until it is called the first time and sets itself to NULL. The up-side of all of this, is that at least for __array_struct__ we do a CapsuleCheck first, so it doesn't matter too much if it the checks here are slow...

seberg added a commit to seberg/numpy that referenced this issue Jul 24, 2020
This was previously allowed for nested cases, i.e.:

   np.array([np.int64])

but not for the scalar case:

   np.array(np.int64)

The solution is to align these two cases by always interpreting
these as not being array-likes (instead of invalid array-likes)
if the passed in object is a `type` object and the special
attribute has a `__get__` attribute (and is thus probable a
property or method).

The (arguably better) alterative to this is to move the special
attribute lookup to be on the type instead of the instance
(which is what python does). This will definitely require some
adjustments in our tests to use properties, but is probably
fine with respect to most actual code.
(The tests more commonly use this to quickly set up an
array-like, while it is a fairly strange pattern for typical code.)

Address parts of numpygh-16939
Closes nump
8000
ygh-8877
@seberg
Copy link
Member
seberg commented Jul 24, 2020

Opted for isinstance(arraylike, type) and hasattr(arraylike.special_attribute, "__get__")...

@seberg
Copy link
Member
seberg commented Jul 24, 2020

As to the SciPy issues, I am wondering if we could just get away with it, but have not thought about trying to keep supporting them much. Note that this really only affects np.array, assignments like arr[0] = arr and arr[0, ...] = arr could keep using reverse broadcasting (although the first case still only works by using float(arr), I admit.

If we expect this to be extremely rare, I would like it. But, I admit I am worried that the error is very confusing when you look at it for code that previously worked.

I am not sure yet, I suppose I can guess that for basically all of our numerical types, I know that it will incidentally work out (and thus a DeprecationWarning would be in order), but otherwise the error needs to be raised immediately.

seberg added a commit to seberg/numpy that referenced this issue Jul 24, 2020
Previously, code such as:
```
np.array([0, np.array([0])], dtype=np.int64)
```
worked by discovering the array as having a shape of `(2,)` and
then using `int(np.array([0]))` (which currently succeeds).
This is also a ragged array, and thus deprecated here earlier on.
(As a detail, in the new code the assignment should occur as
an array assignment and not using the `__int__`/`__float__`
Python protocols.)

Two details to note:

1. This still skips the deprecation for sequences which are not
   array-likes.  We assume that sequences will always fail the
   conversion to a number.
2. The conversion to a number (or string, etc.) may still fail
   after the DeprecationWarning is given.  This is not ideal, but
   narrowing it down seems tedious, since the actual assignment
   happens in a later step.
   I.e. `np.array([0, np.array([None])], dtype=np.int64)` will give
   the warning, but then fail anyway, since the array cannot be
   assigned.

Closes numpygh-16939
@seberg
Copy link
Member
seberg commented Jul 24, 2020

OK, gh-16943 address the SciPy issue. Not by supporting it, but by deprecating it for now. Its not perfect (you can construct things that skip the deprecation in theory, e.g. scipy sparse matrices might), and you may get the warning but then a failure, which is slightly annoying. But I think it would be fairly tricky to get it "right", and it is clear that this type of usage is bad incorrect. I.e. in my opinion it seems good enough to cover almost everything, in the face of covering everything requiring to mix up two distinct steps of the array-coercion process.

@bashtage
Copy link
Contributor

statsmodels with - - pre is also failing with this.

@charris charris reopened this Jul 25, 2020
@charris
Copy link
Member
charris commented Jul 25, 2020

Keeping issue open as it isn't clear we are done with it.

@bashtage
Copy link
Contributor
bashtage commented Jul 28, 2020

This has not been fixed in master, in case it was though that it had:

import numpy as np
a = np.empty(3,dtype="object")
a[:]=[np.int8, np.int16, np.int32]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-00becd691fe7> in <module>
----> 1 a[:]=[np.int8, np.int16, np.int32]

ValueError: invalid __array_struct__

using 1.20.0.dev0+4690248 last commit Date: Tue Jul 28 07:57:34 2020 +0500

@seberg
Copy link
Member
seberg commented Jul 28, 2020

@bashtage gh-16941 has not been merged, which would solve this (and also let a few other things pass maybe)

@seberg
Copy link
Member
seberg commented Nov 19, 2020

Closing as the PR is merged now. Please ping if I am missing something.

@seberg seberg closed this as completed Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
0