Description
Describe the issue:
Trying to index a StringDType
array of shape (1, 1)
, where the single string has length more than 15, using a list results in a MemoryError
. This also happens when indexing with an array.
Specifically, this error appears when this array is printed, or (more directly), when it is accessed at (-1,-1)
.
Possibly related to #27710.
The issue does not appear when:
- The single string has length 15
- The array has shape
(1, )
Additionally, I get SystemError: error return without exception set
.
Reproduce the code example:
import numpy as np
from numpy.dtypes import StringDType
ok = np.array([["abcdefghijklmno"]], dtype=StringDType())
bad = np.array([["abcdefghijklmnop"]], dtype=StringDType())
ok[[0]][-1, -1]
bad[[0]][-1, -1]
# These also raise errors:
bad[np.array([0])][-1, -1]
repr(bad[[0]])
# However this does not:
ok_2 = np.array(["abcdefghijklmnop"], dtype=StringDType())
repr(ok_2[[0]])
Error message:
Traceback (most recent call last):
File "bug.py", line 8, in <module>
bad[[0]][-1, -1]
~~~~~~~~^^^^^^^^
MemoryError: Failed to load string in StringDType getitem
Traceback (most recent call last):
File "bug.py", line 8, in <module>
bad[[0]][-1, -1]
~~~~~~~~^^^^^^^^
SystemError: error return without exception set
Python and NumPy Versions:
2.2.0.dev0+git20241111.fd4f467
3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]
Runtime Environment:
[{'numpy_version': '2.2.0.dev0+git20241111.fd4f467',
'python': '3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]',
'uname': uname_result(system='Linux', node='laozi', release='6.8.0-47-generic', version='#47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 21:40:26 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL',
'AVX512_SPR']}}]
Context for the issue:
I found the bug because I wanted to randomly permute an array of strings. When I tried to print the result, I got a memory error.
Being able to index a StringDType
array using an array seems like important functionality, because I can't think of a workaround other than iterating over the index array using a Python loop.