8000 BUG: masked std and median on unmasked array result in invalid masked array · Issue #24525 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: masked std and median on unmasked array result in invalid masked array #24525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
maxnoe opened this issue Aug 24, 2023 · 3 comments
Open

Comments

@maxnoe
Copy link
maxnoe commented Aug 24, 2023

Describe the issue:

Since version 1.24, the code example below results in a masked array where the data array and the mask array don't have the same shape

Reproduce the code example:

import numpy as np
print(np.__version__)

rng = np.random.default_rng(0)

data = rng.normal(size=(2, 101))

data[:, 2] = np.nan

std = np.ma.std(data, axis=1)
median = np.ma.median(data, axis=1)

print("median:")
print(repr(median))
print("std:")
print(repr(std))

deviation = data - median[:, np.newaxis]

comparison = deviation < 0.5 * std[:, np.newaxis]

print(comparison.shape, comparison.mask.shape)
print(comparison)

Error message:

Output under 1.23:

1.23.5
median:
masked_array(data=[nan, nan],
             mask=False,
       fill_value=1e+20)
std:
masked_array(data=[--, --],
             mask=[ True,  True],
       fill_value=1e+20,
            dtype=float64)
(2, 101) (2, 101)
[[-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- --]
 [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  -- -- -- --]]

Output under 1.25 (also 1.24):

1.25.2
median:
masked_array(data=[nan, nan],
             mask=False,
       fill_value=1e+20)
std:
masked_array(data=[--, --],
             mask=[ True,  True],
       fill_value=1e+20,
            dtype=float64)
(2, 101) (2, 1)
Traceback (most recent call last):
  File "/home/mnoethe/test_numpy_ma_std.py", line 23, in <module>
    print(comparison)
  File "/home/mnoethe/.local/conda/envs/numpy-1.25/lib/python3.10/site-packages/numpy/ma/core.py", line 3997, in __str__
    return str(self._insert_masked_print())
  File "/home/mnoethe/.local/conda/envs/numpy-1.25/lib/python3.10/site-packages/numpy/ma/core.py", line 3991, in _insert_masked_print
    _recursive_printoption(res, mask, masked_print_option)
  File "/home/mnoethe/.local/conda/envs/numpy-1.25/lib/python3.10/site-packages/numpy/ma/core.py", line 2437, in _recursive_printoption
    np.copyto(result, printopt, where=mask)
ValueError: could not broadcast where mask from shape (2,2) into shape (2,100)

Runtime information:

[{'numpy_version': '1.25.2',
'python': '3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) '
'[GCC 12.3.0]',
'uname': uname_result(system='Linux', node='e5b-dell-12', release='5.14.0-1051-oem', version='#58-Ubuntu SMP Fri Aug 26 05:50:00 UTC 2022', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL',
'AVX512_SPR']}},
{'architecture': 'Haswell',
'filepath': '/home/mnoethe/.local/conda/envs/numpy-1.25/lib/libopenblasp-r0.3.23.so',
'internal_api': 'openblas',
'num_threads': 20,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'}]

Context for the issue:

Most confusingly, the example above works fine with numpy 1.25 if the shape of the data array is (2, 100) (just one element smaller in the last dimension).

@lvllvl
Copy link
Contributor
lvllvl commented Sep 2, 2023

I'm not entirely sure if this solves your problem but I think this is resolved in the most recent 2.0.0.dev0+git20230830.b73a5ae version.
Screenshot 2023-09-01 at 8 01 05 PM

@maxnoe
Copy link
Author
maxnoe commented Sep 2, 2023

Why do std and median have different masks?

Why is the median Nan unmasked but std masked?

@melissawm melissawm added the component: numpy.ma masked arrays label Sep 4, 2023
@fengluoqiuwu
Copy link
Contributor

I noticed this bug and I’d like to take a closer look and see if I can provide a solution. It might because some issue in sqrt. I’ll work on a potential fix and submit a PR if I make progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0