8000 Adding np.nanmean(), nanstd(), and nanvar() by WeatherGod · Pull Request #3297 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Adding np.nanmean(), nanstd(), and nanvar() #3297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Adding np.nanmean(), np.nanstd(), np.nanvar()
  • Loading branch information
WeatherGod committed May 2, 2013
commit de30692f0c8c677553127063e360c1514dd3e32b
79 changes: 78 additions & 1 deletion numpy/core/_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from numpy.core import multiarray as mu
from numpy.core import umath as um
from numpy.core.numeric import asanyarray
from numpy.core.numeric import array, asanyarray, isnan

def _amax(a, axis=None, out=None, keepdims=False):
return um.maximum.reduce(a, axis=axis,
Expand Down Expand Up @@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False):
ret = ret / float(rcount)
return ret

def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False):
arr = array(a, subok=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to say that it might be better a baseclass + wrap at the end (for matrix support, but matrix support is bad anyway...), but then the non-nan code does the same. Which makes me wonder, would it be sensible to just create a where= kwarg instead making the nan-funcs just tiny wrappers? Of course I could dream about having where for usual ufunc.reduce, but I think it probably would require larger additions to the nditer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @mwiebe did something along those lines at one point with the NA work, but it got pulled out. I seriously want a where= kwarg in the ufunc architecture so that I can "fix" masked arrays making a copy of itself whenever one does a min or a max.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, there is a where= in ufunc.call now, but Mark didn't get around
to implementing it for ufunc.reduce. It would be great to have, definitely.

On Thu, May 2, 2013 at 9:09 AM, Benjamin Root notifications@github.comwrote:

In numpy/core/_methods.py:

@@ -61,6 +61,26 @@ def _mean(a, axis=None, dtype=None, out=None, keepdims=False):
ret = ret / float(rcount)
return ret

+def _nanmean(a, axis=None, dtype=None, out=None, keepdims=False):

  • arr = array(a, subok=True)

I think @mwiebe https://github.com/mwiebe did something along those
lines at one point with the NA work, but it got pulled out. I seriously
want a where= kwarg in the ufunc architecture so that I can "fix" masked
arrays making a copy of itself whenever one does a min or a max.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3297/files#r4054067
.

8000
mask = isnan(arr)

# Upgrade bool, unsigned int, and int to float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast instead of upgrade.

if dtype is None and arr.dtype.kind in ['b','u','i']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, issubdtype would be better.

if issubdtype(dtype, np.integer) or issubdtype(dtype, np.bool):

If you want to modify issubdtype to take a tuple second argument I won't complain ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeatherGod Still needs fixing.

ret = um.add.reduce(arr, axis=axis, dtype='f8',
out=out, keepdims=keepdims)
else:
mu.copyto(arr, 0.0, where=mask)
ret = um.add.reduce(arr, axis=axis, dtype=dtype,
out=out, keepdims=keepdims)
rcount = (~mask).sum(axis=axis)
if isinstance(ret, mu.ndarray):
ret = um.true_divide(ret, rcount,
out=ret, casting='unsafe', subok=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that is going to truncate rather than round if the output is integer, which doesn't seem right. Might want to document that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that would happen in regular _mean() as well, right?

else:
ret = ret / float(rcount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if rcount is 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you get NaN. Exactly as intended.

return ret

def _var(a, axis=None, dtype=None, out=None, ddof=0,
keepdims=False):
arr = asanyarray(a)
Expand Down Expand Up @@ -101,6 +121,52 @@ def _var(a, axis=None, dtype=None, out=None, ddof=0,

return ret

def _nanvar(a, axis=None, dtype=None, out=None, ddof=0,
keepdims=False):
arr = array(a, subok=True)
mask = isnan(arr)

# First compute the mean, saving 'rcount' for reuse later
if dtype is None and arr.dtype.kind in ['b','u','i']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

arrmean = um.add.reduce(arr, axis=axis, dtype='f8', keepdims=True)
else:
mu.copyto(arr, 0.0, where=mask)
arrmean = um.add.reduce(arr, axis=axis, dtype=dtype,
keepdims=True)
rcount = (~mask).sum(axis=axis, keepdims=True)
if isinstance(arrmean, mu.ndarray):
arrmean = um.true_divide(arrmean, rcount,
out=arrmean, casting='unsafe', subok=False)
else:
arrmean = arrmean / float(rcount)

# arr - arrmean
x = arr - arrmean
x[mask] = 0.0

# (arr - arrmean) ** 2
if arr.dtype.kind == 'c':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if issubdtype(arr.dtype, np.complex):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeatherGod still needs fixing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do I get the complex (and integer and others) classes from in this context? I get really confused in these core libraries because they don't import numpy like I am used to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking around, I may have misled you. The types are in numeric types, so do

from . import numerictypes as nt

then you can do

if issubclass(dtype.type, nt.complexfloating):
    blah

etc. You can see examples in numpy/core/arrayprint.py. The issubdtype function is in there also.

But now I'm wondering why these functions are in _methods.py since they aren't array methods or called from c code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... after implementing this, my tests started failing. Everything returned NaNs. Banged my head against the wall for a few days until I started to step through the code. Now, would someone kindly tell me why the following returns True??!?

np.issubdtype(np.float64, np.bool)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because its buggy :) (I am pretty sure there is an open issue for this somewhere). Or if you like, because np.bool is bool and np.bool_ is the actual numpy type and the python type misbehaves.

x = um.multiply(x, um.conjugate(x), out=x).real
else:
x = um.multiply(x, x, out=x)

# add.reduce((arr - arrmean) ** 2, axis)
ret = um.add.reduce(x, axis=axis, dtype=dtype, out=out,
keepdims=keepdims)

# add.reduce((arr - arrmean) ** 2, axis) / (n - ddof)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(n - ddof) could be negative, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a possibility, and I was wondering about that. Do note that the same issue exists for _var() and _std(), so we probably should fix it there too and add tests.

if not keepdims and isinstance(rcount, mu.ndarray):
rcount = rcount.squeeze(axis=axis)
rcount -= ddof
if isinstance(ret, mu.ndarray):
ret = um.true_divide(ret, rcount,
out=ret, casting='unsafe', subok=False)
else:
ret = ret / float(rcount)

return ret


def _std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
keepdims=keepdims)
Expand All @@ -111,3 +177,14 @@ def _std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
ret = um.sqrt(ret)

return ret

def _nanstd(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
ret = _nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
keepdims=keepdims)

if isinstance(ret, mu.ndarray):
ret = um.sqrt(ret, out=ret)
else:
ret = um.sqrt(ret)

return ret
Loading
0