8000 gh-126024: Use only memcpy for unaligned loads in find_first_nonascii by encukou · Pull Request #127769 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-126024: Use only memcpy for unaligned loads in find_first_nonascii #127769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

encukou
Copy link
Member
@encukou encukou commented Dec 9, 2024

Sorry for getting to this late.

GH-126025 optimized the UTF-8 decoder with a fast find_first_nonascii function.
GH-127566 then fixed an alignment issue by switching one case to memcpy -- on all platforms, even those that used the load_unaligned helper (which does a byte-by-byte copy), to memcpy.

This switches the remaining load_unaligned call to memcpy, and removes the now-unused helper.

@methane
Copy link
Member
methane commented Dec 9, 2024

Did you benchmarked it?
I didn't use memcpy because it is much slower than manual copy.
As I remember, I tried switch-case + fixed size memcpy, but it is not as fast as manual copy on some compilers.

@methane
Copy link
Member
methane commented Dec 10, 2024

result:

gcc 13.2.0
3. 0.007522980 ns (memcpy)
3. 0.006466835 ns (switch + memcpy)
4. 0.003683150 ns (switch + shift)
5. 0.003133516 ns (switch + union)

clang 18.1.3
3. 0.007062990 ns (memcpy)
3. 0.003032249 ns (switch + memcpy)
4. 0.002390183 ns (switch + shift)
5. 0.003310016 ns (switch + union)

code:
https://github.com/methane/notes/blob/02fdadf71aed28dd7b8a512b0ff2079f22033e55/c/first_nonascii/nonascii.c#L39-L150

current implementation uses switch+union, but switch+shift might be better because we use clang for JIT.

@@ -5114,9 +5072,14 @@ find_first_nonascii(const unsigned char *start, const unsigned char *end)
p += SIZEOF_SIZE_T;
}
}

// less than size_t bytes left.
assert(end - p <= SIZEOF_SIZE_T);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion would fail when not PY_LITTLE_ENDIAN && HAVE_CTZ case.

                // big endian and minor compilers are difficult to test.
                // fallback to per byte check.
                break;

@encukou
Copy link
Member Author
encukou commented Dec 10, 2024

Ah, my benchmark was flawed. Sorry for the noise!

@encukou encukou closed this Dec 10, 2024
@encukou encukou deleted the memcpy-unaligned branch December 11, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0