8000 Do not warn when invalid values are masked in a MaskedArray · Issue #4959 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Do not warn when invalid values are masked in a MaskedArray #4959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gerritholl opened this issue Aug 12, 2014 · 13 comments · Fixed by #16022 or #21977 · May be fixed by #22914
Open

Do not warn when invalid values are masked in a MaskedArray #4959

gerritholl opened this issue Aug 12, 2014 · 13 comments · Fixed by #16022 or #21977 · May be fixed by #22914

Comments

@gerritholl
Copy link
Contributor

When doing a mathematical operation on a masked array, and some elements are invalid for the particular mathematical operation, numpy should not issue a warning if all invalid elements are masked.

This cannot be handled by numpy.seterr: I want numpy to warn or raise when performing an invalid operation on any non-masked value, but to be silent when performing an invalid operation on a masked value.

In [1]: A = arange(-2, 5)/2

In [2]: Am = numpy.ma.masked_less_equal(A, 0)

In [3]: numpy.log10(Am)
/export/data/home/gholl/venv/gerrit/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log10
  #!/export/data/home/gholl/venv/gerrit/bin/python3.4
/export/data/home/gholl/venv/gerrit/bin/ipython3:1: RuntimeWarning: invalid value encountered in log10
  #!/export/data/home/gholl/venv/gerrit/bin/python3.4
Out[3]: 
masked_array(data = [-- -- -- -0.3010299956639812 0.0 0.17609125905568124 0.3010299956639812],
             mask = [ True  True  True False False False False],
       fill_value = 1e+20)
@njsmith
Copy link
Member
njsmith commented Aug 12, 2014

This could probably be fixed using a combination of __numpy_ufunc__ and
where=, if anyone wants to try.

On Tue, Aug 12, 2014 at 8:22 PM, gerritholl notifications@github.com
wrote:

When doing a mathematical operation on a masked array, and some elements
are invalid for the particular mathematical operation, numpy should not
issue a warning if all invalid elements are masked.

This cannot be handled by numpy.seterr: I want numpy to warn or raise
when performing an invalid operation on any non-masked value, but to be
silent when performing an invalid operation on a masked value.

In [1]: A = arange(-2, 5)/2

In [2]: Am = numpy.ma.masked_less_equal(A, 0)

In [3]: numpy.log10(Am)
/export/data/home/gholl/venv/gerrit/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log10
#!/export/data/home/gholl/venv/gerrit/bin/python3.4
/export/data/home/gholl/venv/gerrit/bin/ipython3:1: RuntimeWarning: invalid value encountered in log10
#!/export/data/home/gholl/venv/gerrit/bin/python3.4
Out[3]:
masked_array(data = [-- -- -- -0.3010299956639812 0.0 0.17609125905568124 0.3010299956639812],
mask = [ True True True False False False False],
fill_value = 1e+20)


Reply to this email directly or view it on GitHub
#4959.

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@kuma14142
Copy link

A little experimentation showed that np.ma.log10() worked on the example above. Is the intention for the default math functions to call the masked array function?

In a closely related issue, supplying nan values in masked data causes an error in the safe domain calculations. Here the array's data is used in comparison functions even if it is masked. For example np.array([np.nan]) > np.array([4]) should and does raise an error.

The test system is Debian 7.8 with the latest numpy 1.9.2 installed in ~/.local
(The same behaviour is obtained from WinPython 2.7.9, numpy 1.9.2)

Test code

import sys
import numpy as np

print(sys.version)
print("np.__version__ {0}".format(np.__version__))

A = np.arange(-2, 5)/2
Am = np.ma.masked_less_equal(A, 0)
print('np.ma.log10(Am) works OK.') 
tm = np.ma.log10(Am)
print(tm.data)
print(tm.mask)
print('np.log10(Am) raises warning')
t = np.log10(Am)
print(t.data)
print(t.mask)

# Now my similar issue
np.seterr(all='raise') # provides full traceback
x = np.ma.array([1.0, np.nan, 3.0], mask=[False, True, False])
print(x)
print(1/x) # generates an error

Output

2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2]
np.__version__ 1.9.2
np.ma.log10(Am) works OK.
[-1.      -1.       0.       0.       0.       0.       0.30103]
[ True  True  True  True False False False]
np.log10(Am) raises warning
ma_error.py:15: RuntimeWarning: divide by zero encountered in log10
  t = np.log10(Am)
ma_error.py:15: RuntimeWarning: invalid value encountered in log10
  t = np.log10(Am)
[ 1.       1.       1.       1.       0.       0.       0.30103]
[ True  True  True  True False False False]
[1.0 -- 3.0]
Traceback (most recent call last):
  File "ma_error.py", line 23, in <module>
    print(1/x) # generates an error
  File "~/.local/lib/python2.7/site-packages/numpy/ma/core.py", line 2875, in __array_wrap__
    d = filled(domain(*args), True)
  File "~/.local/lib/python2.7/site-packages/numpy/ma/core.py", line 790, in __call__
    return umath.absolute(a) * self.tolerance >= umath.absolute(b)
FloatingPointError: invalid value encountered in greater_equal

Notes

I had a quick look at passing the mask (OR of the input values masks) to the domain function and only testing non-masked values. Something like for line 790

ret = np.oneslike(m, dtype=np.bool_)
ret[~m] = umath.absolute(a[~m]) * self.tolerance >= umath.absolute(b[~m])
return ret

@argriffing
Copy link
Contributor

Just to summarize, there are two issues. One is that np.log(m) has different behavior than np.ma.log(m) when m is a masked array. A separate issue is that np.ma.log has trouble with masked-out nan although it can deal intelligently with other masked-out values outside of the domain of log. These issues generalize to functions other than log.

>>> import numpy as np
>>> np.log(np.ma.array([1, -1], mask=[0, 1]))
__main__:1: RuntimeWarning: invalid value encountered in log
masked_array(data = [0.0 --],
             mask = [False  True],
       fill_value = 1e+20)

>>> np.ma.log(np.ma.array([1, np.nan], mask=[0, 1]))
/numpy/ma/core.py:805: RuntimeWarning: invalid value encountered in less_equal
  return umath.less_equal(x, self.critical_value)
masked_array(data = [0.0 --],
             mask = [False  True],
       fill_value = 1e+20)

@kuma14142
Copy link

On the second issue, it is not just np.ma.log10 but a number of other functions that don't handle np.nan values in masked elements.

@bidhya
Copy link
bidhya commented Dec 31, 2016

This issue still exist for Numpy version 1.11.2

@efiring
Copy link
Contributor
efiring commented Jan 16, 2017

The problem also appears in simple comparisons:

In [2]: np.ma.masked_invalid([1.1, np.nan]) > 1
/Users/efiring/anaconda/envs/test/bin/ipython:1: RuntimeWarning: invalid value encountered in greater
  #!/Users/efiring/anaconda/envs/test/bin/python

masked_array(data = [True --],
             mask = [False  True],
       fill_value = True)

@cosama
Copy link
cosama commented Apr 13, 2020

How would I do the following:

d = np.ma.array([1, None], mask=[False, True])
d[d > 0] =  0

This currently, creates a TypeError: '>' not supported between instances of 'NoneType' and 'int'.

I love numpy and think masked arrays are a great thing to have, but not having this fixed makes them quite useless in many cases.

More than happy to help, if anybody provides me with some pointers.

Edit
Seems to be an issue here:

numpy/numpy/ma/core.py

Lines 4045 to 4061 in efa7bad

if mask.dtype.names is not None:
# For possibly masked structured arrays we need to be careful,
# since the standard structured array comparison will use all
# fields, masked or not. To avoid masked fields influencing the
# outcome, we set all masked fields in self to other, so they'll
# count as equal. To prepare, we ensure we have the right shape.
broadcast_shape = np.broadcast(self, odata).shape
sbroadcast = np.broadcast_to(self, broadcast_shape, subok=True)
sbroadcast._mask = mask
sdata = sbroadcast.filled(odata)
# Now take care of the mask; the merged mask should have an item
# masked if all fields were masked (in one and/or other).
mask = (mask == np.ones((), mask.dtype))
else:
# For regular arrays, just use the data as they come.
sdata = self.data

Looks like a fix has been implemented for structured arrays, maybe it is enough to default to the if statement?

Edit 2
I was wrong, this is only called for __eq__ and __ne__. The __lt__, __le__, __gt__ and __ge__ methods seem to be not defined for a masked array.

Edit 3
Found a bit hacky solution for the issue, if someone has the same:

tmp = d.data[~d.mask]
tmp[tmp > 0] = 0
d.data[~d.mask] = tmp

@eric-wieser
Copy link
Member

@cosama: please open a new issue, that seems unrelated

@cosama
Copy link
cosama commented Apr 13, 2020

@eric-wieser. I can do that, but if I change None to np.nan in the example I provided, it causes the warning issue mentioned here #4959 (comment). It is just a more extreme case, of the issue, where instead of causing a warning it actually breaks.

Edit:

I opened a new issue concerning the comparision operations here: #15978. So that the discussion here can focus on the ufunc implementations.

@greglucas
Copy link
Contributor

I've been running into this warning when passing values to a log-scaled axis in Matplotlib, where I've masked the bad values before passing in, so I expect everything to be OK.

The problem is that np.log10(x) will call the standard ufunc even if x is a MaskedArray. The warning is avoided if instead, I call np.ma.log10(x) which then goes into the Masked ufunc implementation.

@efiring, I started using your simple example, but it is actually fixed in master as a side-effect of #15230.

I think the action item would be that if a ufunc gets a MaskedArray as input, that it dispatches over to the Masked Implementation of that ufunc if it exists. I'm not sure if that was an explicit design decision or not though.

@greglucas
Copy link
Contributor

It looks like MaskedArray does not define a __array_ufunc__ function, which could be used to do the dispatching to the local masked functions in the module. I just tried this quick prototype, which seemed to solve this issue.

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
        HANDLED_FUNCTIONS = {'log10': log10}
        func = HANDLED_FUNCTIONS.get(ufunc.__name__, None)
        if func:
            return getattr(func, method)(*inputs, **kwargs)
        return NotImplemented

Is this something that is worth implementing on the MaskedArray class? If so, I'd be happy to try and write up a PR to generalize this to all ufuncs in the np.ma module already.

I tried to see if anyone had worked on this before, but it doesn't look like it. This idea was brought up in #15200 suggesting this may be a good thing to do.

@rgommers
Copy link
Member

@greglucas that sounds great, a PR would be nice.

This idea was brought up in #15200 suggesting this may be a good thing to do.

Yes, I agree with those comments.

@mattip
Copy link
Member
mattip commented Jul 20, 2022

Reopening, since the PR to use __array_ufunc__ was reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0