8000 BUG: `np.nonzero` outputs too large indices for boolean matrices · Issue #23196 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content
BUG: np.nonzero outputs too large indices for boolean matrices #23196
Closed
@rank-and-files

Description

@rank-and-files

Describe the issue:

When calling np.nonzero on some boolean matrices, sometimes the returned indices are larger than it should be possible.
(for example 1152921504606852819 , when they should be smaller than 12000).

The code example below reproduces this bug.
In the below script, note that one could set FORK to False and the bug is still thrown but it takes longer (for me it was maximally 17000 around iterations for a similar script.)
I tried this on two different machines, both running 20.04.1-Ubuntu with around 32 GB of RAM.
It happens with different numpy versions, including the latest one (1.24.2) and different ways of installing numpy (conda, pip and poetry).

When using FORK = True in the script below, the error is shown for me after 20 iterations already (output 20 in stdout). If not, it has proven effective to Ctrl + C and try again.

Reproduce the code example:

import os

import numpy as np

FORK = True


def main():

    np.random.seed(4321)

    if FORK:
        pid = os.fork()
        np.random.seed(4321)
        if pid > 0:
            np.random.seed(1234)
            pid = os.fork()
            if pid > 0:
                np.random.seed(123)
                pid = os.fork()
                if pid > 0:
                    np.random.seed(321)
                    if pid > 0:
                        np.random.seed(12)
                        pid = os.fork()
                        if pid > 0:
                            np.random.seed(21)
                            pid = os.fork()
                            if pid > 0:
                                np.random.seed(1)

    count = 0
    while True:
        count += 1
        if count % 10 == 0:
            print(count)
        random_num_one = np.random.randint(6000, 8000)
        random_num_two = np.random.randint(10000, 12000)

        self_offsets 
702F
= np.zeros((random_num_one, random_num_two, 2))

        random_arr = np.random.random((random_num_one, random_num_two))
        mask = random_arr >= 0.5
        ys_rel, xs_rel = np.nonzero(mask)

        if np.max(xs_rel) > random_num_two:
            raise Exception(f"This should not happen: {np.max(xs_rel)} > {random_num_two}")

        if np.max(ys_rel) > random_num_one:
            raise Exception(f"This should not happen: {np.max(ys_rel)} > {random_num_one}")


if __name__ == "__main__":
    main()

Error message:

Traceback (most recent call last):
  File "scripts/minimal_new.py", line 52, in <module>
    main()
  File "scripts/minimal_new.py", line 45, in main
    raise Exception(f"numpy does not work: {np.max(xs)} > {random_num_two}")
Exception: This should not happen: 1152921504606852819 > 10326

Runtime information:

Ouput of import sys, numpy; print(numpy.__version__); print(sys.version)

1.24.2
3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0]

Output of print(numpy.show_runtime())

[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/benr/.cache/pypoetry/virtualenvs/projectname-6Jmumlav-py3.8/lib/python3.8/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so',
  'internal_api': 'openblas',
  'num_threads': 16,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.21'}]

Operating system

Linux 5.15.0-58-generic #64~20.04.1-Ubuntu

Context for the issue:

The usage is in the context of data loading for the analysis of images (semantic segmentation).
I cannot work without getting the indices where this kind of matrices are nonzero, I'm bound to use workarounds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0