8000 BUG: MaskedArray with nested dtype and object elements cause AttributeError on access · Issue #15895 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: MaskedArray with nested dtype and object elements cause AttributeError on access #15895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ttmapr opened this issue Apr 2, 2020 · 6 comments · Fixed by #15949
Closed

Comments

@ttmapr
Copy link
ttmapr commented Apr 2, 2020

When I create a MaskedArray using a nested dtype which contains an object at the innermost level, trying to access it causes an AttributeError: 'str' object has no attribute 'ndim'.

If I use int instead of object, the element access works fine.

The issue seems to be caused by the assumption that _fill_value is of a numpy type and has an ndim attribute.
This assumption is true for numpy.int32 which would be used if int was specified inside the dtype.
But for object, the _fill_value will be a just python str '?', which has no attribute ndim.

I propose to change the code line in question to

if hasattr(dout._fill_value, 'ndim') and dout._fill_value.ndim > 0:

Reproducing code example:

import numpy as np
my_dtype = np.dtype([('b', [('c', object)], (1,))])
a = np.ma.masked_all((1,), my_dtype)
print(a['b']['c'])

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python37\lib\site-packages\numpy\ma\core.py", line 3276, in __getitem__
    if dout._fill_value.ndim > 0:
AttributeError: 'str' object has no attribute 'ndim'

Numpy/Python version information:

1.18.1 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)]

@ttmapr ttmapr changed the title BUG: MaskedArray with nested dtype and object leafs throws exception BUG: MaskedArray with nested dtype and object elements cause AttributeError on access Apr 2, 2020
@vrakesh
Copy link
Member
vrakesh commented Apr 3, 2020

Nice catch

but shouldn't we just check if __fill_value is not a str rather than use hasattr. But even then it feels hacky, there seems to be a comment regarding issue #6723 above the check.

is this the right approach @seberg ? Trying to understand the background better.

@seberg
Copy link
Member
seberg commented Apr 3, 2020

Well, its a bit of a mess. But aside from that, you are right of course. Checking for ndim has a bad code smell (and could definitely fail). Even the current code can do something incorrect for certain object cases in theory.
I think the solution has to be to test for this happening more directly, one thing I am considering is changing it along:

new_fill_value = self._fill_value[indx, ...]  # guarantee array return

and moving the flat line out of the if. Or, alternative adding an else: dout._fill_value = new_fill_value[()].

@eric-wieser
Copy link
Member

So, the purpose of _fill_value was to always be an array, unlike .fill_value which is the scalar.

Do we know how _fill_value is ending up with the wrong type?

@seberg
Copy link
Member
seberg commented Apr 3, 2020

Ah sorry, this is of course structured access, which is the problem (so the ... suggestion will not work on its own). In any case, this code branch seems to not know about _fill_value being guaranteed to be an array, and it has to make sure of that.

Probably the best thing to do is to get the new dtype directly with dtype = self._fill_value.dtype[indx], and then using dtype.ndim to check for that special branch and otherwise create a new empty array with that dtype and assign to it to ensure that the new _fill_value is again an array.

@vrakesh
Copy link
Member
vrakesh commented Apr 11, 2020

I have looked into the issue, a little more

I have a better understanding of masked array

The solution would be something like this? @seberg

                    if self._fill_value.dtype[indx].ndim > 0:
                        if not (dout._fill_value ==
                                dout._fill_value.flat[0]).all():
                            warnings.warn(
                                "Upon accessing multidimensional field "
                                f"{indx!s}, need to keep dimensionality "
                                "of fill_value at 0. Discarding "
                                "heterogeneous fill_value and setting "
                                f"all to {dout._fill_value[0]!s}.",
                                stacklevel=2)
                        dout._fill_value = dout._fill_value.flat[0]
                    else:
                        dout._fill_value = array(self._fill_value[indx])

@seberg
Copy link
Member
seberg commented Apr 11, 2020

Yeah, something along those lines, although I did not check carefully, and there may be some other tricky stuff lurking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0