8000 BUG/TST: Fix for #6723 including test: force fill_value.ndim==0 by gerritholl · Pull Request #6728 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG/TST: Fix for #6723 including test: force fill_value.ndim==0 #6728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gerritholl
Copy link
Contributor

Fix issue #6723. Given an exotic masked structured array, where one of
the fields has a multidimensional dtype, make sure that, when accessing
this field, the fill_value still makes sense. As it stands prior to this
commit, the fill_value will end up being multidimensional, possibly with a
shape incompatible with the mother array, which leads to broadcasting
errors in methods such as .filled(). This commit uses the first element
of this multidimensional fill value as the new fill value.

Also add a test to verify that fill_value.ndim remains 0 after indexing.

@gerritholl gerritholl force-pushed the structured_multidim_masked_array_fillvalue branch from 9f8f739 to f667c81 Compare December 1, 2015 14:08
@@ -3121,6 +3121,13 @@ def __getitem__(self, indx):
if isinstance(indx, basestring):
if self._fill_value is not None:
dout._fill_value = self._fill_value[indx]
# If we're indexing a multidimensional field in a
# structured array (such as dtype("(2)i2,(2)i1")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be clearer if written dtype("(2,)i2,(2,)i1"). Current success looks like a parse oddity.

@gerritholl gerritholl force-pushed the structured_multidim_masked_array_fillvalue branch 2 times, most recently from fd68517 to 8d1ce6e Compare December 2, 2015 00:53
@ahaldane
Copy link
Member
ahaldane commented Dec 2, 2015

It's a messy situation, and this may be the best solution.

Another option that comes to mind is that maskedarray could always broadcast the fill value to the data shape. Perhaps this can be achieved by a few judicious uses of broadcast_to in MaskedArray.filled? This way if you create an N x M masked array, you can also set the fill value to a 1d array of length M (or shape (N,1) or (1,M))

@gerritholl
Copy link
Contributor Author

@ahaldane Is it sure that such broadcasting is always possible? My gut feeling says it is, but my gut feeling is not very reliable. I'm still not clear what practical purpose it serves at all for a user to change the fill_value.

@ahaldane
Copy link
Member
ahaldane commented Dec 4, 2015

It's probably possible, but after thinking I'm pretty sure it will open up many new cans of worms and would take effort. For instance, we would have to worry about slicing the fill_value if the array gets sliced. I think this PR is a fair fix for this rare case, for now.

If/when we end up reimplementing masked arrays to fix all sorts of issues like this (there have been suggestions to do so once __numpy_ufunc__ behavior is decided) it might be worth thinking about allowing both the mask and the fill_value to broadcast, if it's not too messy. It might actually be convenient, eg we could mask out entire rows & columns easily.

I looked at the code in this PR, it looks good to me. I vote for merging.

@seberg
Copy link
Member
seberg commented Dec 5, 2015

@ahaldane, just the side discussion.... But I tend to think right now that the "correct" solution from a future point of view would be to not allow it at all, each dtype should have a single mask value (of course only the last layer of a dtype has multiple dtypes inside).
Dtype field arrays belong to a single dtype and in some sense this dtype is associated with the fill value, if we come from the "dtypes should be the carriers of mask" side.

Fix issue numpy#6723.  Given an exotic masked structured array, where one of
the fields has a multidimensional dtype, make sure that, when accessing
this field, the fill_value still makes sense.  As it stands prior to this
commit, the fill_value will end up being multidimensional, possibly with
a shape incompatible with the mother array, which leads to broadcasting
errors in methods such as .filled().  This commit uses the first element
of this multidimensional fill value as the new fill value.  When more
than one unique value existed in fill_value, a warning is issued.

Also add a test to verify that fill_value.ndim remains 0 after indexing.
@gerritholl gerritholl force-pushed the structured_multidim_masked_array_fillvalue branch from 8d1ce6e to 090e85e Compare December 7, 2015 16:45
@charris
Copy link
Member
charris commented Jan 12, 2016

@ahaldane @seberg So this should go in? If so, one of you can do the honors.

@ahaldane
Copy link
Member

Well, I don't think @seberg objected above, and looking over it again it seems like a reasonable band-aid. The most important part is just to warn the user if something strange is happening. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0