8000 BUG: A segfault in chararray (2.0.0dev0 regression) · Issue #25513 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: A segfault in chararray (2.0.0dev0 regression) #25513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
neutrinoceros opened this issue Dec 31, 2023 · 4 comments · Fixed by #25515
Closed

BUG: A segfault in chararray (2.0.0dev0 regression) #25513

neutrinoceros opened this issue Dec 31, 2023 · 4 comments · Fixed by #25515

Comments

@neutrinoceros
Copy link
Contributor
neutrinoceros commented Dec 31, 2023

Describe the issue:

Astropy's tests against numpy's dev branch are currently choking on a segfault.
The issue can be boiled down to a pure numpy reproducer.

I wasn't able to setup my dev env properly to provide the full stack trace information yet (I'm working on it), but I could bisect the problem to 19396d2 (#25171)

Reproduce the code example:

import numpy as np

carr = np.chararray((2,), itemsize=25)
carr[:] = [b'  4.52173913043478315E+00', b'  4.95652173913043548E+00']
carr.replace(b"E", b"D")

Error message:

No response

Python and NumPy Versions:

2.0.0.dev0+git20231230.ee3124b
3.12.0 (main, Oct 27 2023, 11:50:57) [Clang 15.0.0 (clang-1500.0.40.1)]

Runtime Environment:

[{'numpy_version': '2.0.0.dev0+git20231231.f1fac82',
  'python': '3.12.1 (main, Dec 11 2023, 18:41:50) [Clang 15.0.0 '
            '(clang-1500.0.40.1)]',
  'uname': uname_result(system='Darwin', node='kwanzaabot.home', release='23.2.0', version='Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020', machine='arm64')},
 {'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
                      'found': ['ASIMDHP'],
                      'not_found': ['ASIMDFHM']}}]
None

Context for the issue:

astropy/astropy#15797

@mhvk
Copy link
Contributor
mhvk commented Dec 31, 2023

Trying a bit, the segfault does not always happen immediately, but the results definitely are not always correct:

carr = np.chararray((2,), itemsize=25)
carr[:] = [b'  4.52173913043478315E+00', b'  4.95652173913043548E+00']
carr.replace(b'E', b'D')
# array([b'  4.52173913043478315E+00', b'DDDDDDDDDDDDDDDDDDDDDD+00'],
#       dtype='|S25')

When things don't give a segfault, on exit of ipython one gets "double free or corruption (out)" EDIT: or "malloc(): mismatching next->prev_size (unsorted)"

@mhvk
Copy link
Contributor
mhvk commented Dec 31, 2023

Found the culprit if not yet the solution - the special-cased single-character case, and specifically the case where memchr is invoked; see

inline Py_ssize_t
findchar(Buffer<enc> s, Py_ssize_t n, npy_ucs4 ch)
{
Buffer<enc> p = s, e = s + n;
if (n > MEMCHR_CUT_OFF) {
p = s.buffer_memchr(ch, n);
if (p.buf != NULL) {
return (p - s);
}
return -1;
}
while (p < e) {
if (*p == ch) {
return (p - s);
}
p++;
}
return -1;
}

And indeed, .find() is broken too:

arr.find(b'E')
# array([0, 0])

Basically, s.buf gets overwritten to be the same as p.buf.

@mhvk
Copy link
Contributor
mhvk commented Dec 31, 2023

Also, is it intended that if the replacement string is too long to possibly be present, that an empty output is returned?

In [1]: a = np.array('abc').view(np.char.chararray)

In [2]: a.replace('012345', '')
Out[2]: array('', dtype='<U3')

In [3]: 'abc'.replace('012345', '')
Out[3]: 'abc'

@mhvk
Copy link
Contributor
mhvk commented Dec 31, 2023

I think I have a fix in #25515 (for all 3 issues that came up: the segfault, wrong .find result, and wrong .replace` result)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0