-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: Add replace ufunc for bytes and unicode dtypes #25171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e774722
to
19396d2
Compare
@seberg @ngoldbaum Another somewhat hacky ufunc for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am generally OK with the hack here. Yes, the casting safety is side-stepped, but since we don't expose out
if it is wrapped, that isn't a huge issue.
numpy/_core/umath.py
Outdated
counts = count(x1, x2, 0, numpy.iinfo(numpy.int_).max) | ||
buffersizes = str_len(x1) + counts * (str_len(x3)-str_len(x2)) | ||
max_buffersize = numpy.max(buffersizes) | ||
out = numpy.empty(x1.shape, dtype=f"{x1.dtype.char}{max_buffersize}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one caveat here is that for nathan's new string dtype (or user one) this doesn't make sense, and we should only do it for the NumPy builtin strings (maybe in principle also for some other fixed length ones, but that seems less important).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. Do you think it'd be fine to leave it as is for now and deal with it once stringdtype is in? cc @ngoldbaum
61330b5
to
cc561bf
Compare
cc561bf
to
bc5f633
Compare
Let's try to merge this before the 2.0 release candidate if possible. |
Missed this, will try to look at it tomorrow. At the last community meeting Chuck said something about the middle of January at the earliest, so we have some time. Are you still planning to do an I'm also planning to try working on stringdtype ufuncs next week. I'll ping you if I run into trouble adapting your code to work with utf8. |
Yup, I'll start working on it today. |
It seems this PR is responsible for the following test failure across all CI jobs: Test failure=================================== FAILURES ===================================
_____________________ TestMethodsScalarValues.test_replace _____________________
self = <numpy._core.tests.test_defchararray.TestMethodsScalarValues object at 0x7f27b5cde3a0>
def test_replace(self):
> assert_equal(np.char.replace('Python is good', 'good', 'great'),
'Python is great')
self = <numpy._core.tests.test_defchararray.TestMethodsScalarValues object at 0x7f27b5cde3a0>
numpy/_core/tests/test_defchararray.py:736:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
a = 'Python is good', old = 'good', new = 'great', count = 9223372036854775807
@array_function_dispatch(_replace_dispatcher)
def replace(a, old, new, count=None):
"""
For each element in `a`, return a copy of the string with all
occurrences of substring `old` replaced by `new`.
Calls :meth:`str.replace` element-wise.
Parameters
----------
a : array-like of str or unicode
old, new : str or unicode
count : int, optional
If the optional argument `count` is given, only the first
`count` occurrences are replaced.
Returns
-------
out : ndarray
Output array of str or unicode, depending on input type
See Also
--------
str.replace
Examples
--------
>>> a = np.array(["That is a mango", "Monkeys eat mangos"])
>>> np.char.replace(a, 'mango', 'banana')
array(['That is a banana', 'Monkeys eat bananas'], dtype='<U19')
>>> a = np.array(["The dish is fresh", "This is it"])
>>> np.char.replace(a, 'is', 'was')
array(['The dwash was fresh', 'Thwas was it'], dtype='<U19')
"""
max_int64 = numpy.iinfo(numpy.int64).max
count = count if count is not None else max_int64
counts = numpy._core.umath.count(a, old, 0, max_int64)
buffersizes = (
numpy._core.umath.str_len(a)
+ counts * (numpy._core.umath.str_len(new) -
numpy._core.umath.str_len(old))
)
max_buffersize = numpy.max(buffersizes)
> out = numpy.empty(a.shape, dtype=f"{a.dtype.char}{max_buffersize}")
E AttributeError: 'str' object has no attribute 'shape' |
Thanks! Should be fixed by #25484 |
No description provided.