8000 BUG: Fancy indexing with arrays of uint types fail on StringDType arrays · Issue #27710 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fancy indexing with arrays of uint types fail on StringDType arrays #27710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kolamp opened this issue Nov 7, 2024 · 4 comments · Fixed by #27715
Closed

BUG: Fancy indexing with arrays of uint types fail on StringDType arrays #27710

kolamp opened this issue Nov 7, 2024 · 4 comments · Fixed by #27715
Assignees
Labels
00 - Bug component: numpy.strings String dtypes and functions

Comments

@kolamp
Copy link
kolamp commented Nov 7, 2024

Describe the issue:

When trying to index and array of type StringDType with an array of any unsigned integer type it fails returning a MemoryError (strangely):

MemoryError: Failed to load string in StringDType getitem
MemoryError: String deallocation failed in clear loop

It does work when the index array is cast to int.

Reproduce the code example:

import numpy as np
str_arr = np.array(["a" * 25], dtype=np.dtypes.StringDType())
idx = np.array([0], dtype=np.uint8)
print(str_arr[idx.astype(int)])
print(str_arr[idx])

Error message:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    print(str_arr[idx])
  File "lib/python3.10/site-packages/numpy/_core/arrayprint.py", line 1692, in _array_str_implementation
    return array2string(a, max_line_width, precision, suppress_small, ' ', "")
  File "lib/python3.10/site-packages/numpy/_core/arrayprint.py", line 776, in array2string
    return _array2string(a, options, separator, prefix)
  File "lib/python3.10/site-packages/numpy/_core/arrayprint.py", line 547, in wrapper
    return f(self, *args, **kwargs)
  File "lib/python3.10/site-packages/numpy/_core/arrayprint.py", line 580, in _array2string
    lst = _formatArray(a, format_function, options['linewidth'],
  File "lib/python3.10/site-packages/numpy/_core/arrayprint.py", line 935, in _formatArray
    return recurser(index=(),
  File "lib/python3.10/site-packages/numpy/_core/arrayprint.py", line 896, in recurser
    word = recurser(index + (-1,), next_hanging_indent, next_width)
  File "lib/python3.10/site-packages/numpy/_core/arrayprint.py", line 839, in recurser
    return format_function(a[index])
MemoryError: Failed to load string in StringDType getitem
MemoryError: String deallocation failed in clear loop

Python and NumPy Versions:

Python: 3.10.6
Numpy: 2.1.3

Runtime Environment:

[{'numpy_version': '2.1.3',
'python': '3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) '
'[GCC 10.4.0]',
'uname': uname_result(system='Linux', node='...', release='5.15.133.1-microsoft-standard-WSL2', version='#1 SMP Thu Oct 5 21:02:42 UTC 2023', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': 'lib/python3.10/site-packages/numpy.libs/libscipy_openblas64_-ff651d7f.so',
'internal_api': 'openblas',
'num_threads': 16,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]

Context for the issue:

No response

@kolamp kolamp added the 00 - Bug label Nov 7, 2024
@ngoldbaum
Copy link
Member

Thanks for the report! I can reproduce this.

MemoryError (strangely)

This indicates somehow the array buffer and arena allocation have gone out of sync. Unfortunately there's a long tail of these we've had to deal with because we keep finding places in the NumPy codebase that make (incorrect) assumptions about the output and input dtypes being identical in casts. It's probably another one of those.

@ngoldbaum ngoldbaum self-assigned this Nov 7, 2024
@ngoldbaum ngoldbaum changed the title Fancy indexing with arrays of uint types fail on StringDType arrays BUG: Fancy indexing with arrays of uint types fail on StringDType arrays Nov 7, 2024
@ngoldbaum
Copy link
Member

assumptions about the output and input dtypes being identical in casts

if (PyArray_GetDTypeTransferFunction(1,
itemsize, itemsize,
PyArray_DESCR(self), PyArray_DESCR(self),
0, &cast_info, &transfer_flags) != NPY_SUCCEED) {
goto finish;
}

Like this one. I think making it so that it's not using PyArray_DESCR(self) for the input and output descriptors will probably fix this.

@ngoldbaum
Copy link
Member

See #27715

@SamAdamDay
Copy link

To add to this and relate to #27737, I get the error even with int type when the array is 2D:

import numpy as np
str_arr = np.array([["a" * 25]], dtype=np.dtypes.StringDType())
idx = np.array([0], dtype=int)
print(str_arr[idx])

Replacing 25 by 15 doesn't error however

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.strings String dtypes and functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0