You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whenever a MaskedArray method returns a scalar value but that value should be masked, all MaskedArray methods return a reference to a singleton instance np.ma.masked of type MaskedConstant. I think the motivation was to be able to write code like 'if result is masked:'. However, returing a singleton causes some problems that make it difficult to use np.ma.MaskedArray as a 'drop in' replacement for ndarray.
Here is a summary of the problems I see, in the hope that they can be fixed.
1. Many operations involving masked are coerced to float
Probably the worst issue is that masked is of type float. This means the return value of a method may be of a different type than the original array. This is especially bad for boolean arrays. For example, if arr is a boolean array, but all or any return a masked value, the following line will fail since you cannot do ~ to a float.
>>> a = np.ma.array([True, True], mask=[True, True])
>>> ~a.all()
It also means that certain series of operations on masked arrays will sometimes get cast to float when they wouldn't be with ndarrays.
2. Overwriting masked causes strange results in completely separate code
A less serious problem arises if someone tries to assign to the return value of a MaskedArray method, which would end up assigning to the singleton. That will then affect code anywhere that involves the masked singleton. I came across this when one numpy unit test would modify the singleton, and then another would read it, and I would get an error depending on the order the unit tests were run. The problem arises in case like marr2[:] = marr1.method() if the method returns masked. This means marr2 will get filled with arbitrary gargage, but maybe that's not a problem since those values will be masked garbage. (Although, it was a somewhat confusing bug to fix).
3. Code acting on a return value of a MaskedArray method can fail (if masked was returned)
Consider some code of the form
>>> result = arr.sum()
>>> dosomething(result)
This might work fine most of the time, but fail in the (possibly rare) case that the sum returns the singleton. It might be that the operation is not allowed on np.ma.masked, or it might be that further use of np.ma.masked wouldn't work as before.
Most cases of 'dosomething' I checked seem OK, but here are some that cause problems:
a) What if someone decides to remove the mask on a return value? Eg
>>> result = arr.sum()
>>> result.mask = False
if arr.sum happened to return the masked singleton, this would cause havoc. @rgommers suggested making .mask readonly which sounds like a good idea to me, although it means the code will generally run fine for most arr but will raise an error in the possibly rare case the sum is fully masked.
b) writing to a scalar
Consider
>>> result[()] = 6
which would be fine for ndarrays, but raises an error for masked arrays if result is masked (though it's hard to imagine a case where someone would want to index a numpy scalar this way).
c) passing masked as the out parameter of a ufunc
>>> np.ma.log(inputarr, out=result)
I think (though there are other bugs involved here) that using a variable which might be np.ma.masked as the out parameter to a ufunc will cause problems.
The text was updated successfully, but these errors were encountered:
Whenever a MaskedArray method returns a scalar value but that value should be masked, all MaskedArray methods return a reference to a singleton instance
np.ma.masked
of typeMaskedConstant
. I think the motivation was to be able to write code like 'if result is masked:'. However, returing a singleton causes some problems that make it difficult to use np.ma.MaskedArray as a 'drop in' replacement for ndarray.Here is a summary of the problems I see, in the hope that they can be fixed.
1. Many operations involving
masked
are coerced to floatProbably the worst issue is that
masked
is of typefloat
. This means the return value of a method may be of a different type than the original array. This is especially bad for boolean arrays. For example, if arr is a boolean array, but all or any return a masked value, the following line will fail since you cannot do~
to a float.It also means that certain series of operations on masked arrays will sometimes get cast to float when they wouldn't be with ndarrays.
2. Overwriting
masked
causes strange results in completely separate codeA less serious problem arises if someone tries to assign to the return value of a MaskedArray method, which would end up assigning to the singleton. That will then affect code anywhere that involves the masked singleton. I came across this when one numpy unit test would modify the singleton, and then another would read it, and I would get an error depending on the order the unit tests were run. The problem arises in case like
marr2[:] = marr1.method()
if the method returnsmasked
. This means marr2 will get filled with arbitrary gargage, but maybe that's not a problem since those values will be masked garbage. (Although, it was a somewhat confusing bug to fix).3. Code acting on a return value of a MaskedArray method can fail (if
masked
was returned)Consider some code of the form
This might work fine most of the time, but fail in the (possibly rare) case that the sum returns the singleton. It might be that the operation is not allowed on
np.ma.masked
, or it might be that further use ofnp.ma.masked
wouldn't work as before.Most cases of 'dosomething' I checked seem OK, but here are some that cause problems:
a) What if someone decides to remove the mask on a return value? Eg
if arr.sum happened to return the masked singleton, this would cause havoc. @rgommers suggested making
.mask
readonly which sounds like a good idea to me, although it means the code will generally run fine for most arr but will raise an error in the possibly rare case the sum is fully masked.b) writing to a scalar
Consider
which would be fine for ndarrays, but raises an error for masked arrays if result is masked (though it's hard to imagine a case where someone would want to index a numpy scalar this way).
c) passing
masked
as theout
parameter of a ufuncI think (though there are other bugs involved here) that using a variable which might be
np.ma.masked
as the out parameter to a ufunc will cause problems.The text was updated successfully, but these errors were encountered: