gh-126024: Use only memcpy for unaligned loads in find_first_nonascii #127769

encukou · 2024-12-09T16:47:52Z

Sorry for getting to this late.

GH-126025 optimized the UTF-8 decoder with a fast find_first_nonascii function.
GH-127566 then fixed an alignment issue by switching one case to memcpy -- on all platforms, even those that used the load_unaligned helper (which does a byte-by-byte copy), to memcpy.

This switches the remaining load_unaligned call to memcpy, and removes the now-unused helper.

Issue: Improve UTF-8 decode speed #126024

methane · 2024-12-09T18:39:48Z

Did you benchmarked it?
I didn't use memcpy because it is much slower than manual copy.
As I remember, I tried switch-case + fixed size memcpy, but it is not as fast as manual copy on some compilers.

methane · 2024-12-10T07:45:34Z

result:

gcc 13.2.0
3. 0.007522980 ns (memcpy)
3. 0.006466835 ns (switch + memcpy)
4. 0.003683150 ns (switch + shift)
5. 0.003133516 ns (switch + union)

clang 18.1.3
3. 0.007062990 ns (memcpy)
3. 0.003032249 ns (switch + memcpy)
4. 0.002390183 ns (switch + shift)
5. 0.003310016 ns (switch + union)

code:
https://github.com/methane/notes/blob/02fdadf71aed28dd7b8a512b0ff2079f22033e55/c/first_nonascii/nonascii.c#L39-L150

current implementation uses switch+union, but switch+shift might be better because we use clang for JIT.

methane · 2024-12-10T08:31:47Z

Objects/unicodeobject.c

@@ -5114,9 +5072,14 @@ find_first_nonascii(const unsigned char *start, const unsigned char *end)
            p += SIZEOF_SIZE_T;
        }
    }
+
+    // less than size_t bytes left.
+    assert(end - p <= SIZEOF_SIZE_T);


This assertion would fail when not PY_LITTLE_ENDIAN && HAVE_CTZ case.

// big endian and minor compilers are difficult to test. // fallback to per byte check. break;

encukou · 2024-12-10T10:06:58Z

Ah, my benchmark was flawed. Sorry for the noise!

Use memcpy to avoid unaligned read in find_first_nonascii

ec25996

encukou requested a review from methane December 9, 2024 16:47

bedevere-app bot added the awaiting core review label Dec 9, 2024

bedevere-app bot mentioned this pull request Dec 9, 2024

Improve UTF-8 decode speed #126024

Closed

encukou added the skip news label Dec 9, 2024

methane reviewed Dec 10, 2024

View reviewed changes

encukou closed this Dec 10, 2024

encukou deleted the memcpy-unaligned branch December 11, 2024 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-126024: Use only memcpy for unaligned loads in find_first_nonascii #127769

gh-126024: Use only memcpy for unaligned loads in find_first_nonascii #127769

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-126024: Use only memcpy for unaligned loads in find_first_nonascii #127769

gh-126024: Use only memcpy for unaligned loads in find_first_nonascii #127769

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!