8000 Possibly unwanted behaviour of numpy.median when the array contains numpy.nan (Trac #2126) · Issue #586 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Possibly unwanted behaviour of numpy.median when the array contains numpy.nan (Trac #2126) #586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 8 comments
Labels
00 - Bug component: numpy._core Priority: high High priority, also add milestones for urgent issues

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/2126 on 2012-05-04 by trac user koji, assigned to unknown.

Because the median function is dependent on the sort function, which places the nan entries at the end, the median function may overestimate median in an unfortunate situation.

I got really surprised to see that line 38 returned 11.5. I was expecting either np.nan or 11.0.

Perhaps an explicit handling of np.nan (either take away from the sorting to begin with) would be better, or make it return np.nan when there's one ore more nan's in the array. It makes me wonder if anybody had tripped over this without realising it.

In [33]: np.sort(np.array([np.nan, 10]))
Out[33]: array([ 10., nan])

In [34]: np.sort(np.array([np.nan, 10, 11]))
Out[34]: array([ 10., 11., nan])

In [35]: np.sort(np.array([np.nan, 10, 11, 12]))
Out[35]: array([ 10., 11., 12., nan])

In [36]: np.median(np.array([np.nan, 10]))
Out[36]: nan

In [37]: np.median(np.array([np.nan, 10, 11]))
Out[37]: 11.0

In [38]: np.median(np.array([np.nan, 10, 11, 12]))
Out[38]: 11.5

In [39]: np.version
Out[39]: '1.5.1'

Python 2.7.1 |EPD 7.0-1 (32-bit)| (r271:86832, Dec 3 2010, 15:41:32)

@empeeu
Copy link
Contributor
empeeu commented Feb 12, 2014

I've been wanting to participate in an open source project. I ran into this problem recently so I figured, why not fix it! Hopefully I'll have this sorted soon.

@njsmith
Copy link
Member
njsmith commented Feb 12, 2014

IMHO the correct behaviour is to return NaN when there are NaNs present.

@empeeu
Copy link
Contributor
empeeu commented Feb 12, 2014

I agree. But what if you're taking the median over an axis? Should you just return nans for those axes? Or return all nan's. Additionally, it would be nice to have an ignore_nan flag in cases where you still want a useful median. In that case, nans should not count toward the total number of elements in the array. How does that sound?

@empeeu
Copy link
Contributor
empeeu commented Feb 13, 2014

Actually, you probably don't want a ignore_nan flag, but instead have a nanmedian function, much like the nansum and nanmax etc.

@njsmith
Copy link
Member
njsmith commented Feb 13, 2014

If going over an axis, there should be nan output for slices that have
NaNs, and the actual median for slices that don't have NaNs.
On 12 Feb 2014 19:03, "empeeu" notifications@github.com wrote:

Actually, you probably don't want a ignore_nan flag, but instead have a
nanmedian function, much like the nansum and nanmax etc.


Reply to this email directly or view it on GitHubhttps://github.com//issues/586#issuecomment-34933943
.

@shoyer
Copy link
Member
shoyer commented Feb 15, 2014

@empeeu Note that there is a nanmedian function already in scipy: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.nanmedian.html

@juliantaylor
Copy link
Contributor

don't copy that, its not very good

empeeu added a commit to empeeu/numpy that referenced this issue Mar 8, 2014
empeeu added a commit to empeeu/numpy that referenced this issue Mar 16, 2014
empeeu added a commit to empeeu/numpy that referenced this issue Apr 7, 2015
empeeu added a commit to empeeu/numpy that referenced this issue Jun 22, 2015
…esent in array to close issue numpy#586.

Also added unit tests.
@njsmith
Copy link
Member
njsmith commented Oct 17, 2015

Looks like this was fixed by #5753 and no-one noticed, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy._core Priority: high High priority, also add milestones for urgent issues
Projects
None yet
Development

No branches or pull requests

5 participants
0