10000 gh-117431: Optimize str.startswith by eendebakpt · Pull Request #117480 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-117431: Optimize str.startswith #117480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

eendebakpt
Copy link
Contributor
@eendebakpt eendebakpt commented Apr 2, 2024

We apply two optimizations in tailmatch (which is used in both str.startswith and str.endswith).

  • For one and two character arguments we avoid a call to memcmp as all characters have already been checked
  • In the call to memcmp we can reduce the number of bytes compared since the first and last character have already been checked.

Notes:

Two possible optimizations not included in this PR:

  • For the single character case we still do some double work as PyUnicode_READ(kind_self, data_self, offset) == PyUnicode_READ(kind_sub, data_sub, 0) and PyUnicode_READ(kind_self, data_self, offset + end_sub) == PyUnicode_READ(kind_sub, data_sub, end_sub) are equal in that case. We can eliminate that by adding something like
int first_character_equal = PyUnicode_READ(kind_self, data_self, offset) == PyUnicode_READ(kind_sub, data_sub, 0)
if (PyUnicode_GET_LENGTH(substring)==1) {
   return first_character_equal ;
...

This makes the code for the single character case a bit faster, but the code a bit more complex.

  • We can make the number of bytes compared even smaller, but we would have calculate a different offset which does not seem worth the effort.

Benchmark (on top of #117466): python -m timeit -s "s = 'abcdef'" "s.startswith('a')"

main: 10000000 loops, best of 5: 27.2 nsec per loop
PR: 10000000 loops, best of 5: 26.3 nsec per loop

@erlend-aasland
Copy link
Contributor

Could you add the other optimisations as separate commits?

@serhiy-storchaka
Copy link
Member

The difference between 27.2 and 26.3 ns is too small and can be the result of unrelated factors. I get a nanosecond variation when run the same command several times.

@erlend-aasland
Copy link
Contributor

The difference between 27.2 and 26.3 ns is too small and can be the result of unrelated factors. I get a nanosecond variation when run the same command several times.

Yes, so I'm curious about the other two mentioned optimisations that are not (yet) part of this PR. Perhaps they have a greater impact.

@eendebakpt
Copy link
Contributor Author

The difference between 27.2 and 26.3 ns is too small and can be the result of unrelated factors. I get a nanosecond variation when run the same command several times.

Yes, so I'm curious about the other two mentioned optimisations that are not (yet) part of this PR. Perhaps they have a greater impact.

I created a PR with the other approach: #117782.

@eendebakpt
Copy link
Contributor Author

Closing this in favor of the alternate PR.

@eendebakpt eendebakpt closed this May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0