8000 Operation on masked_array changes fill_value · Issue #3762 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Operation on masked_array changes fill_value #3762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
agartland opened this issue Sep 19, 2013 · 7 comments
Open

Operation on masked_array changes fill_value #3762

agartland opened this issue Sep 19, 2013 · 7 comments

Comments

@agartland
Copy link

I first raised this issue on stackoverflow (see link on the bottom)

Seems like the new masked_array should inherit the fill_value from the two masked_arrays being summed?

Can someone explain to me this behavior of a numpy masked_array? It seems to change the fill_value after applying the sum operation, which is confusing if you intend to use the filled result.

data=ones((5,5))
m=zeros((5,5),dtype=bool)

"""Mask out row 3"""
m[3,:]=True
arr=ma.masked_array(data,mask=m,fill_value=nan)

print arr
print 'Fill value:', arr.fill_value
print arr.filled()

farr=arr.sum(axis=1)
print farr
print 'Fill value:', farr.fill_value
print farr.filled()

"""I was expecting this"""
print nansum(arr.filled(),axis=1)

Prints output:

[[1.0 1.0 1.0 1.0 1.0]
 [1.0 1.0 1.0 1.0 1.0]
 [1.0 1.0 1.0 1.0 1.0]
 [-- -- -- -- --]
 [1.0 1.0 1.0 1.0 1.0]]
Fill value: nan
[[  1.   1.   1.   1.   1.]
 [  1.   1.   1.   1.   1.]
 [  1.   1.   1.   1.   1.]
 [ nan  nan  nan  nan  nan]
 [  1.   1.   1.   1.   1.]]
[5.0 5.0 5.0 -- 5.0]
Fill value: 1e+20
[  5.00000000e+00   5.00000000e+00   5.00000000e+00   1.00000000e+20
   5.00000000e+00]
[  5.   5.   5.  nan   5.]

http://stackoverflow.com/questions/18879272/why-does-sum-operation-on-numpy-masked-array-change-fill-value-to-1e20

@akleeman
Copy link

A similar thing happens with the take operation in numpy 1.7 ...

np.version.version
Out: '1.7.0'
tmp = np.random.normal(size=(3, 4))
masked = np.ma.masked_array(tmp, mask=tmp > 0, fill_value=np.nan)
masked.fill_value
Out: nan
masked.take([0, 1, 2], axis=0).fill_value
Out: 1e+20
np.all(masked.take([0, 1, 2], axis=0) == masked)
Out: True
np.all(masked.take([0, 1, 2], axis=0).filled() == masked.filled())
Out: False

Looks like the same thing would happen in numpy 1.8

@charris
Copy link
Member
charris commented Feb 15, 2014

Sounds like something reasonable to do.

@charris charris added Defect and removed Defect labels Feb 23, 2014
@alimuldal
Copy link
Contributor

A related question popped up on SO today: http://stackoverflow.com/q/35324836/1461210

In versions >= 1.10.0:

x = np.random.randn(10, 10)
m = np.ma.masked_array(x, x < 0.5, fill_value=1)
print((m * m).fill_value)
# 1e+20
print(np.multiply(m, m).fill_value)
# 1.0

Prior to 1.10.0 I get a fill value of 1.0 in both cases, which seems much more reasonable to me. The behaviour in question was introduced in 3c6b6ba.

@charris
Copy link
Member
charris commented Feb 11, 2016

Probably related: #7122.

@alimuldal
Copy link
Contributor

@charris Definitely looks that way - I also did a bisect and identified 3c6b6ba as the culprit

@jjhelmus
Copy link
Contributor
jjhelmus commented Apr 1, 2016

It looks like most (all) of the reductions operations do not preserve the fill_value attribute:

 import numpy as np
a = np.ma.arange(12, fill_value=88).reshape(3, 4)
print(np.__version__)
print("Original fill_value:", a.fill_value)
print("a.sum fill_value:", a.sum(axis=1).fill_value)
print("a.cumsum fill_value:",  a.cumsum().fill_value)
1.12.0.dev0+2af06c8
Original fill_value: 88
a.sum fill_value: 999999
a.cumsum fill_value: 999999

This could be fixed (and I'm willing to submit a PR) but explicitly setting the fill value of the returned array in the various MaskedArray methods. Is this the desired behavior, or should the current behavior of using the default fill_value for these arrays be maintained?

@sannant
Copy link
sannant commented Mar 1, 2019

Some related problem with the multiplication with a scalar:

some example test code:

import numpy as np
h = np.array([0.0,2,3.2,4.5])
print("h: {}".format(h))

for fill_value in [float("inf"), 1e20, 0]:
    print("\ntesting with fill_value={}".format(fill_value))
    hma=np.ma.masked_array(h, [True, False, True, False], fill_value=fill_value)
    print("hma: {}".format(hma.__repr__()))
    print("np.asarray(hma)): {}".format(np.asarray(hma)))
    print("1.0 * hmat: {}".format((1.0 * hma).__repr__()))
    print("np.asarray(1.0 * hma)): {}".format(np.asarray(1.0 * hma)))

And the output for the fill_value 0 :

testing with fill_value=0
hma: masked_array(data=[--, 2.0, --, 4.5],
             mask=[ True, False,  True, False],
       fill_value=0.0)
np.asarray(hma)): [0.  2.  3.2 4.5]
1.0 * hmat: masked_array(data=[--, 2.0, --, 4.5],
             mask=[ True, False,  True, False],
       fill_value=0.0)
np.asarray(1.0 * hma)): [1.  2.  1.  4.5]

the fill'_value doesn't change. according to hma.__repr__(), but when applying np.asarray the fill_value is systematically replaced by 1.

System informations:

numpy: 1.16.0
python: Python 3.7.0
os: macOS 10.13.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
0