gh-117431: Optimize str.startswith #117480

eendebakpt · 2024-04-02T19:24:24Z

We apply two optimizations in tailmatch (which is used in both str.startswith and str.endswith).

For one and two character arguments we avoid a call to memcmp as all characters have already been checked
In the call to memcmp we can reduce the number of bytes compared since the first and last character have already been checked.

Notes:

Two possible optimizations not included in this PR:

For the single character case we still do some double work as PyUnicode_READ(kind_self, data_self, offset) == PyUnicode_READ(kind_sub, data_sub, 0) and PyUnicode_READ(kind_self, data_self, offset + end_sub) == PyUnicode_READ(kind_sub, data_sub, end_sub) are equal in that case. We can eliminate that by adding something like

int first_character_equal = PyUnicode_READ(kind_self, data_self, offset) == PyUnicode_READ(kind_sub, data_sub, 0)
if (PyUnicode_GET_LENGTH(substring)==1) {
   return first_character_equal ;
...

This makes the code for the single character case a bit faster, but the code a bit more complex.

We can make the number of bytes compared even smaller, but we would have calculate a different offset which does not seem worth the effort.

Benchmark (on top of #117466): python -m timeit -s "s = 'abcdef'" "s.startswith('a')"

main: 10000000 loops, best of 5: 27.2 nsec per loop
PR: 10000000 loops, best of 5: 26.3 nsec per loop

Issue: Improve performance of startswith, endswith, count, *find, and *index methods for str, bytes and bytearray #117431

erlend-aasland · 2024-04-03T20:52:26Z

Could you add the other optimisations as separate commits?

serhiy-storchaka · 2024-04-09T16:20:31Z

The difference between 27.2 and 26.3 ns is too small and can be the result of unrelated factors. I get a nanosecond variation when run the same command several times.

erlend-aasland · 2024-04-10T14:32:48Z

The difference between 27.2 and 26.3 ns is too small and can be the result of unrelated factors. I get a nanosecond variation when run the same command several times.

Yes, so I'm curious about the other two mentioned optimisations that are not (yet) part of this PR. Perhaps they have a greater impact.

eendebakpt · 2024-04-11T21:14:12Z

The difference between 27.2 and 26.3 ns is too small and can be the result of unrelated factors. I get a nanosecond variation when run the same command several times.

Yes, so I'm curious about the other two mentioned optimisations that are not (yet) part of this PR. Perhaps they have a greater impact.

I created a PR with the other approach: #117782.

eendebakpt · 2024-05-11T20:26:11Z

Closing this in favor of the alternate PR.

Optimize str.startswith for one and two character arguments

f67a3a5

bedevere-app bot mentioned this pull request Apr 2, 2024

Improve performance of startswith, endswith, count, *find, and *index methods for str, bytes and bytearray #117431

Open

bedevere-app bot added the awaiting review label Apr 2, 2024

erlend-aasland requested review from serhiy-storchaka and erlend-aasland April 2, 2024 19:35

erlend-aasland added the skip news label Apr 3, 2024

Merge branch 'main' into tailmatch

64ef3c0

eendebakpt mentioned this pull request Apr 11, 2024

gh-117431: Improve performance of startswith and endswith #117782

Open

eendebakpt closed this May 11, 2024

eendebakpt deleted the tailmatch branch June 26, 2025 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-117431: Optimize str.startswith #117480

gh-117431: Optimize str.startswith #117480

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-117431: Optimize str.startswith #117480

gh-117431: Optimize str.startswith #117480

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!