-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Possibly unwanted behaviour of numpy.median when the array contains numpy.nan (Trac #2126) #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've been wanting to participate in an open source project. I ran into this problem recently so I figured, why not fix it! Hopefully I'll have this sorted soon. |
IMHO the correct behaviour is to return NaN when there are NaNs present. |
I agree. But what if you're taking the median over an axis? Should you just return nans for those axes? Or return all nan's. Additionally, it would be nice to have an ignore_nan flag in cases where you still want a useful median. In that case, nans should not count toward the total number of elements in the array. How does that sound? |
Actually, you probably don't want a ignore_nan flag, but instead have a nanmedian function, much like the nansum and nanmax etc. |
If going over an axis, there should be nan output for slices that have
|
@empeeu Note that there is a nanmedian function already in scipy: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.nanmedian.html |
don't copy that, its not very good |
to close issue numpy#586. Also added unit test.
to close issue numpy#586. Also added unit test.
to close issue numpy#586. Also added unit test.
…esent in array to close issue numpy#586. Also added unit tests.
Looks like this was fixed by #5753 and no-one noticed, so closing. |
Original ticket http://projects.scipy.org/numpy/ticket/2126 on 2012-05-04 by trac user koji, assigned to unknown.
Because the median function is dependent on the sort function, which places the nan entries at the end, the median function may overestimate median in an unfortunate situation.
I got really surprised to see that line 38 returned 11.5. I was expecting either np.nan or 11.0.
Perhaps an explicit handling of np.nan (either take away from the sorting to begin with) would be better, or make it return np.nan when there's one ore more nan's in the array. It makes me wonder if anybody had tripped over this without realising it.
In [33]: np.sort(np.array([np.nan, 10]))
Out[33]: array([ 10., nan])
In [34]: np.sort(np.array([np.nan, 10, 11]))
Out[34]: array([ 10., 11., nan])
In [35]: np.sort(np.array([np.nan, 10, 11, 12]))
Out[35]: array([ 10., 11., 12., nan])
In [36]: np.median(np.array([np.nan, 10]))
Out[36]: nan
In [37]: np.median(np.array([np.nan, 10, 11]))
Out[37]: 11.0
In [38]: np.median(np.array([np.nan, 10, 11, 12]))
Out[38]: 11.5
In [39]: np.version
Out[39]: '1.5.1'
Python 2.7.1 |EPD 7.0-1 (32-bit)| (r271:86832, Dec 3 2010, 15:41:32)
The text was updated successfully, but these errors were encountered: