ValueError: invalid __array_struct__ when assigning to object-dtype ndarray #16939

TomAugspurger · 2020-07-24T11:47:34Z

Hi. Pandas is broken with numpy master :)

Reproducing code example:

In [1]: import numpy as np
In [2]: out = np.empty(3, dtype="object")
In [3]: out[:] = [np.int64, np.int64, np.int64]

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-d20bd885ed9b> in <module>
----> 1 out[:] = [np.int64, np.int64, np.int64]

ValueError: invalid __array_struct__

Numpy/Python version information:

Installed from a wheel with pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy

1.20.0.dev0+2d12d0c 3.7.6 (default, Dec 30 2019, 19:38:28)
[Clang 11.0.0 (clang-1100.0.33.16)]

I don't immediately see what commit could have caused this. #16850 mentions object-dtype scalars, but doesn't look quite right.

The text was updated successfully, but these errors were encountered:

rgommers · 2020-07-24T12:15:13Z

The SciPy builds on TravisCI with NUMPYSPEC='--pre also just started failing (e.g. https://travis-ci.org/github/scipy/scipy/jobs/711362812, with stats broadcasting and test_methods_with_lists and TestDifferentialEvolutionSolver.test_args_tuple_is_passed failures), may be related.

charris · 2020-07-24T13:44:27Z

Probably because I fixed the nightly cython related build failures and triggered new builds yesterday. The nightly builds had stalled out :)

charris · 2020-07-24T13:51:46Z

@seberg Thoughts?

seberg · 2020-07-24T14:18:39Z

I am a bit surprised I did not notice additional scipy failures when comparing to master. The SciPy ones are reverse broadcasting, I thought I covered that, but apparently I missed something.

The problem is this type of code:

bounds     = [(-np.inf, np.inf), (np.array([2]), np.array([3]))]
np.array(bounds)

which only works because float(np.array([2]) happens to work for numpy arrays specifically. I honestly think those failures are better, but I have to see how we can go through a deprecation.

The pandas code with the dtypes, I am not sure about, may also be coercion related likely. It seems like we used to ignored errors here at some point and now I know that its necessary to support these classes as objects.

seberg · 2020-07-24T14:23:23Z

Ah, on the pandas issue, we currently have this, so that explains the change:

In [1]: np.array(np.int64)       
ValueError: invalid __array_struct__

they behave identical now, but went towards the error case.

eric-wieser · 2020-07-24T14:24:13Z

Was just about to comment the same - this has always been a problem for scalars, it's just we made behavior consistent for lists of them too.

eric-wieser · 2020-07-24T14:25:33Z

#8877 is very similar here, our problem is we use x.__array_struct__ when we ought to be using something like type(x).__array_struct__

eric-wieser · 2020-07-24T14:26:09Z

An easy fix would be to ignore __array_struct__ if it is of type getset_descriptor.

seberg · 2020-07-24T14:54:06Z

True, maybe that is the best solution. We may need to also let property pass, and I have to check if there was a similar change for __array_interface__.

eric-wieser · 2020-07-24T15:05:14Z

Perhaps checking for tp_get is sufficient?

seberg · 2020-07-24T15:44:36Z

Hmmm, I guess so, although it would also be nice not to rely on inspecting slots? The closest thing I could find in python are these from inspect:

def ismethoddescriptor(object):
    """<snip>""""
    if isclass(object) or ismethod(object) or isfunction(object):
        # mutual exclusion
        return False
    tp = type(object)
    return hasattr(tp, "__get__") and not hasattr(tp, "__set__")

def isdatadescriptor(object):
    """<snip>"""
    if isclass(object) or ismethod(object) or isfunction(object):
        # mutual exclusion
        return False
    tp = type(object)
    return hasattr(tp, "__set__") or hasattr(tp, "__delete__")

Interestingly, it seems that tp_descr_get is always set to slot_tp_descr_get (for python defined types), until it is called the first time and sets itself to NULL. The up-side of all of this, is that at least for __array_struct__ we do a CapsuleCheck first, so it doesn't matter too much if it the checks here are slow...

This was previously allowed for nested cases, i.e.: np.array([np.int64]) but not for the scalar case: np.array(np.int64) The solution is to align these two cases by always interpreting these as not being array-likes (instead of invalid array-likes) if the passed in object is a `type` object and the special attribute has a `__get__` attribute (and is thus probable a property or method). The (arguably better) alterative to this is to move the special attribute lookup to be on the type instead of the instance (which is what python does). This will definitely require some adjustments in our tests to use properties, but is probably fine with respect to most actual code. (The tests more commonly use this to quickly set up an array-like, while it is a fairly strange pattern for typical code.) Address parts of numpygh-16939 Closes nump 8000 ygh-8877

seberg · 2020-07-24T16:48:32Z

Opted for isinstance(arraylike, type) and hasattr(arraylike.special_attribute, "__get__")...

seberg · 2020-07-24T16:57:55Z

As to the SciPy issues, I am wondering if we could just get away with it, but have not thought about trying to keep supporting them much. Note that this really only affects np.array, assignments like arr[0] = arr and arr[0, ...] = arr could keep using reverse broadcasting (although the first case still only works by using float(arr), I admit.

If we expect this to be extremely rare, I would like it. But, I admit I am worried that the error is very confusing when you look at it for code that previously worked.

I am not sure yet, I suppose I can guess that for basically all of our numerical types, I know that it will incidentally work out (and thus a DeprecationWarning would be in order), but otherwise the error needs to be raised immediately.

Previously, code such as: ``` np.array([0, np.array([0])], dtype=np.int64) ``` worked by discovering the array as having a shape of `(2,)` and then using `int(np.array([0]))` (which currently succeeds). This is also a ragged array, and thus deprecated here earlier on. (As a detail, in the new code the assignment should occur as an array assignment and not using the `__int__`/`__float__` Python protocols.) Two details to note: 1. This still skips the deprecation for sequences which are not array-likes. We assume that sequences will always fail the conversion to a number. 2. The conversion to a number (or string, etc.) may still fail after the DeprecationWarning is given. This is not ideal, but narrowing it down seems tedious, since the actual assignment happens in a later step. I.e. `np.array([0, np.array([None])], dtype=np.int64)` will give the warning, but then fail anyway, since the array cannot be assigned. Closes numpygh-16939

seberg · 2020-07-24T20:30:14Z

OK, gh-16943 address the SciPy issue. Not by supporting it, but by deprecating it for now. Its not perfect (you can construct things that skip the deprecation in theory, e.g. scipy sparse matrices might), and you may get the warning but then a failure, which is slightly annoying. But I think it would be fairly tricky to get it "right", and it is clear that this type of usage is bad incorrect. I.e. in my opinion it seems good enough to cover almost everything, in the face of covering everything requiring to mix up two distinct steps of the array-coercion process.

bashtage · 2020-07-25T22:45:01Z

statsmodels with - - pre is also failing with this.

charris · 2020-07-25T23:20:30Z

Keeping issue open as it isn't clear we are done with it.

bashtage · 2020-07-28T15:00:56Z

This has not been fixed in master, in case it was though that it had:

import numpy as np
a = np.empty(3,dtype="object")
a[:]=[np.int8, np.int16, np.int32]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-00becd691fe7> in <module>
----> 1 a[:]=[np.int8, np.int16, np.int32]

ValueError: invalid __array_struct__

using 1.20.0.dev0+4690248 last commit Date: Tue Jul 28 07:57:34 2020 +0500

seberg · 2020-07-28T15:17:55Z

@bashtage gh-16941 has not been merged, which would solve this (and also let a few other things pass maybe)

seberg · 2020-11-19T18:47:05Z

Closing as the PR is merged now. Please ping if I am missing something.

TomAugspurger mentioned this issue Jul 24, 2020

DOC: Fixed formatting and errors in whatsnew v1.1.0 pandas-dev/pandas#35398

Merged

seberg mentioned this issue Jul 24, 2020

BUG: Allow array-like types to be coerced as object array elements #16941

Merged

seberg mentioned this issue Jul 24, 2020

DEP: Deprecate size-one ragged array coercion #16943

Merged

rgommers mentioned this issue Jul 24, 2020

Travis X.4 and X.7 failures in master scipy/scipy#12613

Closed

charris closed this as completed in #16943 Jul 25, 2020

charris reopened this Jul 25, 2020

spencerkclark mentioned this issue Jul 26, 2020

cftime plotting fails on upstream-dev pydata/xarray#4265

Closed

bashtage mentioned this issue Jul 27, 2020

CI/COMPAT: read_stata failing in numpy dev pipeline pandas-dev/pandas#35426

Closed

3 tasks

seberg closed this as completed Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ValueError: invalid __array_struct__ when assigning to object-dtype ndarray #16939

ValueError: invalid __array_struct__ when assigning to object-dtype ndarray #16939

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValueError: invalid __array_struct__ when assigning to object-dtype ndarray #16939

ValueError: invalid __array_struct__ when assigning to object-dtype ndarray #16939

Comments

Uh oh!

Reproducing code example:

Error message:

Numpy/Python version information:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!