8000 mp.ma.masked singleton causes difficulties · Issue #5806 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

mp.ma.masked singleton causes difficulties #5806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ahaldane opened this issue Apr 27, 2015 · 1 comment
Open

mp.ma.masked singleton causes difficulties #5806

ahaldane opened this issue Apr 27, 2015 · 1 comment

Comments

@ahaldane
Copy link
Member

Whenever a MaskedArray method returns a scalar value but that value should be masked, all MaskedArray methods return a reference to a singleton instance np.ma.masked of type MaskedConstant. I think the motivation was to be able to write code like 'if result is masked:'. However, returing a singleton causes some problems that make it difficult to use np.ma.MaskedArray as a 'drop in' replacement for ndarray.

Here is a summary of the problems I see, in the hope that they can be fixed.

1. Many operations involving masked are coerced to float

Probably the worst issue is that masked is of type float. This means the return value of a method may be of a different type than the original array. This is especially bad for boolean arrays. For example, if arr is a boolean array, but all or any return a masked value, the following line will fail since you cannot do ~ to a float.

>>> a = np.ma.array([True, True], mask=[True, True])
>>> ~a.all()

It also means that certain series of operations on masked arrays will sometimes get cast to float when they wouldn't be with ndarrays.

2. Overwriting masked causes strange results in completely separate code

A less serious problem arises if someone tries to assign to the return value of a MaskedArray method, which would end up assigning to the singleton. That will then affect code anywhere that involves the masked singleton. I came across this when one numpy unit test would modify the singleton, and then another would read it, and I would get an error depending on the order the unit tests were run. The problem arises in case like marr2[:] = marr1.method() if the method returns masked. This means marr2 will get filled with arbitrary gargage, but maybe that's not a problem since those values will be masked garbage. (Although, it was a somewhat confusing bug to fix).

3. Code acting on a return value of a MaskedArray method can fail (if masked was returned)

Consider some code of the form

>>> result = arr.sum()
>>> dosomething(result)

This might work fine most of the time, but fail in the (possibly rare) case that the sum returns the singleton. It might be that the operation is not allowed on np.ma.masked, or it might be that further use of np.ma.masked wouldn't work as before.

Most cases of 'dosomething' I checked seem OK, but here are some that cause problems:

a) What if someone decides to remove the mask on a return value? Eg
>>> result = arr.sum()
>>> result.mask = False

if arr.sum happened to return the masked singleton, this would cause havoc. @rgommers suggested making .mask readonly which sounds like a good idea to me, although it means the code will generally run fine for most arr but will raise an error in the possibly rare case the sum is fully masked.

b) writing to a scalar

Consider

>>> result[()] = 6

which would be fine for ndarrays, but raises an error for masked arrays if result is masked (though it's hard to imagine a case where someone would want to index a numpy scalar this way).

c) passing masked as the out parameter of a ufunc
>>> np.ma.log(inputarr, out=result)

I think (though there are other bugs involved here) that using a variable which might be np.ma.masked as the out parameter to a ufunc will cause problems.

@bayliffe
Copy link

Rather than open another issue we thought we would just add on an example which has caused us some issues.

We set up a very simple masked array:

>>> masked_array = np.ma.MaskedArray([2.0,1.0], mask=[True, False])
>>> print(masked_array._fill_value)
1e+20

We slice this on the first index and print the result:

>>> masked_array_slice_0 = masked_array[0]
>>> print(masked_array_slice_0)
--
>>> print(masked_array_slice_0.data)
0.0
>>> type(masked_array_slice_0)
<class 'numpy.ma.core.MaskedConstant'>

Note that the value that was underneath the mask has been changed to 0.

If we then slice on the second:

>>> masked_array_slice_1 = masked_array[1]
>>> print(masked_array_slice_1)
1.0
>>> print(masked_array_slice_1.data)
<memory at 0x7fbe74573048>
>>> type(masked_array_slice_1)
<class 'numpy.float64'>

In summary, depending on how we slice:

  • we lose the original value contained within our array if we extract the masked point (a 2 became a 0, which is not the fill value)
  • we get a different return type depending upon which index we extract (similar to np.ma.masked is not a scalar #7588)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0