8000 BUG: masked-array median of 1d array should be scalar by AmitAronovitch · Pull Request #7592 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: masked-array median of 1d array should be scalar #7592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

AmitAronovitch
Copy link
Contributor

It seems like PR #4760, which improved the n-dimensional performance of ma.median, has broken the behavior for the 1d case. The previous behavior seems more correct, because:

(a) It matches the behavior of the non-masked median.
(b) It matches the examples given in the doc-strings.
(c) It matches a simpler rule: a median of n-dim array is n-1 dimensional.

This pull request fixes that, and adds a test that fails in the unfixed code.

Details:
It appears that the issue can be traced back to a special-case in numpy's "advanced indexing" algorithm, where a list of scalars is treated as a 1d array.
The fix works by avoiding this special case. The items of the index are forced to be arrays (in particular, scalars are converted to 0d arrays), so advanced indexing works in a consistent manner.
I wrote it this way to explicitly highlight the fact that the 1d result should be consistent with higher dimensional cases.
However, it is possible that an "if dim==1" would be more readable...

@AmitAronovitch
Copy link
Contributor Author

I noticed that this issue was also reported in #5969. Should move discussion to there.

@ahaldane
Copy link
Member
ahaldane commented May 1, 2016

2 comments:

First, about advanced indexing, you are right that there is a bug here due to the fact that numpy treats arr[[1]] and arr[[np.array(1)]] differently. The reason is that the latter case is not really legal (there is a plan in #4434 to disallow it), and numpy makes a heuristic guess that the user actually indended to write arr[(np.array(1),)], ie, a tuple, which is treated differently. That's what we want, but the heuristic fails to trigger in the 1-d case. So in this PR I think that instead of converting h and odd using np.asarray it is probably better to change the indexing operations to low = asorted[tuple(ind)], and same for high. That should give the same effect without needing asarray.

Second, I think the behavior can be further improved. The way you have it now median returns a 0-d array when called on a 1-d array, but I think it would be better to simply return a scalar. I think this might easily achieved by changing return r to return r[()] in median, since indexing with () converts 0-d arrays to scalars (properly accounting for masked in maskedarrays) but leaves other arrays alone. (But note that indexing with () has a bug that it does not work with string types, see #7267, but that does not matter here since median doesn't work for strings anyway).

low = asorted[ind]
low._sharedmask = False
if hasattr(low,"mask"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if isinstance(low, MaskedArray): I think.

@AmitAronovitch
Copy link
Contributor Author
AmitAronovitch commented May 1, 2016

Thanks for the comments.
Since we do not want to affect the performance (or correctness) of the n-dimensional case, I think it would be best to special-case the 1d case (if len(asorted.shape)==1: ) at the top of the function.
The 1d case would probably become shorter and simpler - I might even copy it directly from the old version.
p.s. from your comment, seems like the casting of ind into tuple might be required anyways, for future compatibility...

@seberg
Copy link
Member
seberg commented May 1, 2016

Adding a tuple call to the indexing, will not affect anything except the speed of the tuple call itself. Making sure you index with a tuple instead of a list is the clean and correct solution for all dimensions.

@charris charris added this to the 1.11.1 release milestone May 2, 2016
@charris
Copy link
Member
charris commented May 5, 2016

Be good to finish this up.

@charris
Copy link
Member
charris commented May 9, 2016

Last call for getting this fixed for 1.11.1.

@charris charris changed the title masked-array median of 1d array should return scalar BUG: masked-array median of 1d array should be scalar May 9, 2016
@AmitAronovitch
Copy link
Contributor Author

sry did not manage to get to it.
others: feel free to go on if you prefer not to wait - I do not know the schedule for 1.11.1, but can't promise I'll have time for this before the weekend.

@charris
Copy link
Member
charris commented May 14, 2016

I am going to do 1.11.1 this weekend (May 15).

@AmitAronovitch
Copy link
Contributor Author
AmitAronovitch commented May 14, 2016

I have implemented an alternate fix (#7635), by explicitly checking for the 1d case.
This restores the original behavior of returning a scalar in the 1d case, and avoids any changes in the n-dimensional case.
We might still want to assure tuple indexing ( asorted[tuple(ind)] ) as suggested above, but that may be considered a separate issue than the topic of this bug.

ahaldane added a commit that referenced this pull request May 22, 2016
@homu
Copy link
Contributor
homu commented May 22, 2016

☔ The latest upstream changes (presumably #7635) made this pull request unmergeable. Please resolve the merge conflicts.

@charris
Copy link
Member
charris commented May 22, 2016

Closing as #7635 has been merged.

@charris charris closed this May 22, 2016
@charris charris removed this from the 1.11.1 release milestone May 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0