8000 BUG: MemoryError when indexing 2D StringDType array with a list index · Issue #27737 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content
BUG: MemoryError when indexing 2D StringDType array with a list index #27737
Closed
@SamAdamDay

Description

@SamAdamDay

Describe the issue:

Trying to index a StringDType array of shape (1, 1), where the single string has length more than 15, using a list results in a MemoryError. This also happens when indexing with an array.

Specifically, this error appears when this array is printed, or (more directly), when it is accessed at (-1,-1).

Possibly related to #27710.

The issue does not appear when:

  • The single string has length 15
  • The array has shape (1, )

Additionally, I get SystemError: error return without exception set.

Reproduce the code example:

import numpy as np
from numpy.dtypes import StringDType

ok  = np.array([["abcdefghijklmno"]], dtype=StringDType())
bad = np.array([["abcdefghijklmnop"]], dtype=StringDType())

ok[[0]][-1, -1]
bad[[0]][-1, -1]

# These also raise errors:
bad[np.array([0])][-1, -1]
repr(bad[[0]])

# However this does not:
ok_2 = np.array(["abcdefghijklmnop"], dtype=StringDType())
repr(ok_2[[0]])

Error message:

Traceback (most recent call last):
  File "bug.py", line 8, in <module>
    bad[[0]][-1, -1]
    ~~~~~~~~^^^^^^^^
MemoryError: Failed to load string in StringDType getitem
Traceback (most recent call last):
  File "bug.py", line 8, in <module>
    bad[[0]][-1, -1]
    ~~~~~~~~^^^^^^^^
SystemError: error return without exception set

Python and NumPy Versions:

2.2.0.dev0+git20241111.fd4f467
3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]

Runtime Environment:

[{'numpy_version': '2.2.0.dev0+git20241111.fd4f467',
'python': '3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]',
'uname': uname_result(system='Linux', node='laozi', release='6.8.0-47-generic', version='#47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 21:40:26 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL',
'AVX512_SPR']}}]

Context for the issue:

I found the bug because I wanted to randomly permute an array of strings. When I tried to print the result, I got a memory error.

Being able to index a StringDType array using an array seems like important functionality, because I can't think of a workaround other than iterating over the index array using a Python loop.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0