8000 Improve performances of `fnmatch.translate` · Issue #122288 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

Improve performances of fnmatch.translate #122288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
picnixz opened this issue Jul 25, 2024 · 7 comments
Closed

Improve performances of fnmatch.translate #122288

picnixz opened this issue Jul 25, 2024 · 7 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@picnixz
Copy link
Member
picnixz commented Jul 25, 2024

Feature or enhancement

Proposal:

I implemented fnmatch.translate and fnmatch.filter in C in #121446 but the performances for fnmatch.filter are less pronounced than those in fnmatch.translate. While I believe that the factor 2x improvement brought by the C implementation is worthwhile for fnmatch.translate, this comes at a high maintenance cost. Instead, I translated (no pun intended) the implementation in Python and obtained the following timings (PGO build):

+------------------------------------------+-----------------------+-----------------------+
| Benchmark                                | fnmatch-translate-ref | fnmatch-translate-py  |
+==========================================+=======================+=======================+
| abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!]  | 6.09 us               | 3.99 us: 1.53x faster |
+------------------------------------------+-----------------------+-----------------------+
| !abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!] | 6.39 us               | 4.07 us: 1.57x faster |
+------------------------------------------+-----------------------+-----------------------+
| a**?**cd**?**??k***                      | 2.24 us               | 1.51 us: 1.49x faster |
+------------------------------------------+-----------------------+-----------------------+
| a/**/b/**/c                              | 1.97 us               | 1.12 us: 1.76x faster |
+------------------------------------------+-----------------------+-----------------------+
| man/man1/bash.1                          | 3.00 us               | 1.21 us: 2.48x faster |
+------------------------------------------+-----------------------+-----------------------+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**          | 5.40 us               | 3.33 us: 1.62x faster |
+------------------------------------------+-----------------------+-----------------------+
| Geometric mean                           | (ref)                 | 1.71x faster          |
+------------------------------------------+-----------------------+-----------------------+

It's not as good as the benchmarks reported in #121445, but I think this could be considered. Note that the improvements are brought by a new (but equivalent) algorithm during the translation phase. Currently, the translation algorithm does:

  • Do not compress consecutive '*' into 1 within the same loop iteration.
  • Use a sentinel to indicate a wildcard instead.
  • Add to a list re.escape(c) for every non-special character (i.e, not for groups beginning by ?*[).
  • In a second phase, merge the groups delimited by the sentinels in order to produce the final regex.

Our (Barney and me) implementation instead does the following:

  • Cache re.escape so that the lookup is faster.
  • Remember the positions of '*' in the above split instead of using a sentinel object.
    That way, I can easily merge parts during the second phase (technically, it'd be possible to do it in one phase but two passes makes it clearer).

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@picnixz picnixz added type-feature A feature request or enhancement performance Performance or resource usage labels Jul 25, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Jul 25, 2024
@barneygale
Copy link
Contributor
  • Instead of having many chunks with 1 or 2 characters, I try to have as many chunks with strings that are already formatted, e.g., ["abc", "*"] instead of ["a", "b", "c", SENTINEL].

The current implementation does this to minimise string concatenation, i.e. it defers join()ing until its unavoidable. This should be faster, so I'm surprised you're getting different results! It's quite possible I've misunderstood your patch though?

@picnixz
Copy link
Member Author
picnixz commented Aug 15, 2024

Thanks for noticing this. I think there are two things to consider:

Calling re.escape(c) at every loop iteration (most of the time we don't have *?[ but rather "normal" characters to be escaped) is probably more costly. You have a call to isinstance and a call to str.translate. By looking at the implementation of str.translate, each time str.translate is called, a writer is allocated on the heap, so it slows down the entire algorithm at every call. In particular, if I need to successively call res.append(re.escape(c)) a lot of time (which is what happens in practice), then it's probably faster to do res.append(re.escape(''.join(x))) or res.extend(re.escape(c) for c in x).

My benchmarks show that res.append(re.escape(''.join(x))) is faster than res.extend(re.escape(c) for c in x). I suspect this is clearly because of the overhead of re.escape.

Now, there is one case where the new algorithm might be slower. This is when you only have a single character in the pending list of characters to escape, namely x = [c]. You can see that the improvements are less pronounced for the case a**?**cd**?**??k*** but still faster. I've added an other case to illustrate an extreme case:

+---------------------------------+-----------------------+-----------------------+
| Benchmark                       | fnmatch-translate-ref | fnmatch-translate-py  |
+=================================+=======================+=======================+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o** | 5.09 us               | 4.95 us: 1.03x faster |
+---------------------------------+-----------------------+-----------------------+

Here the performances are almost identical, and this is what I would have expected. I've also added a comment:

# Handling '?' one at a time seems to more efficient
# even if there are consecutive '?' that could have
# been written directly.

The reason why it's more efficient is because the fast path is actually rarer than the slow path. So, looping over the consecutive '?' is heuristically less efficient. The same analysis could be applied to the case of consecutive * so I compared two implementations (the one in the PR and the one where I keep the treatment of * by not advancing the index in the if c == '*' but guarding the add(STAR) with another if). Benchmarks suggested that using a while to skip over successive * is slightly faster for the other cases (the case a*b*c*d*e*f*g*h*i*j*k*l*m*n*o** is obvisouly slower with a while loop but not that much):

With an inner while-loop to skip over consecutive '*' (PR implementation)

+------------------------------------------+-----------------------+-----------------------+
| Benchmark                                | fnmatch-translate-ref | fnmatch-translate-py  |
+==========================================+=======================+=======================+
| abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!]  | 6.06 us               | 4.92 us: 1.23x faster |
+------------------------------------------+-----------------------+-----------------------+
| !abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!] | 6.37 us               | 5.06 us: 1.26x faster |
+------------------------------------------+-----------------------+-----------------------+
| a**?**cd**?**??k***                      | 2.21 us               | 2.00 us: 1.10x faster |
+------------------------------------------+-----------------------+-----------------------+
| a/**/b/**/c                              | 1.99 us               | 1.56 us: 1.28x faster |
+------------------------------------------+-----------------------+-----------------------+
| man/man1/bash.1                          | 3.04 us               | 1.50 us: 2.03x faster |
+------------------------------------------+-----------------------+-----------------------+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**          | 5.40 us               | 5.25 us: 1.03x faster |
+------------------------------------------+-----------------------+-----------------------+
| Geometric mean                           | (ref)                 | 1.29x faster          |
+------------------------------------------+-----------------------+-----------------------+
With a single if to skip over consecutive '*' (same as the ref implementation)

+------------------------------------------+-----------------------+-----------------------+
| Benchmark                                | fnmatch-translate-ref | fnmatch-translate-py  |
+==========================================+=======================+=======================+
| abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!]  | 6.09 us               | 5.13 us: 1.19x faster |
+------------------------------------------+-----------------------+-----------------------+
| !abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!] | 6.36 us               | 5.23 us: 1.22x faster |
+------------------------------------------+-----------------------+-----------------------+
| a/**/b/**/c                              | 2.01 us               | 1.64 us: 1.22x faster |
+------------------------------------------+-----------------------+-----------------------+
| man/man1/bash.1                          | 3.10 us               | 1.48 us: 2.09x faster |
+------------------------------------------+-----------------------+-----------------------+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**          | 5.44 us               | 5.14 us: 1.06x faster |
+------------------------------------------+-----------------------+-----------------------+
| Geometric mean                           | (ref)                 | 1.26x faster          |
+------------------------------------------+-----------------------+-----------------------+

Benchmark hidden because not significant (1): a**?**cd**?**??k***

One reason is that using an inner while saves a check on whether we need to push the pending characters or not and that we can jump faster to the next character that is not *.

@barneygale
Copy link
Contributor

Thank you very much for the explanation!

Q: could the cost of re.escape() be ameliorated with an lru_cache() instead? e.g.:

diff --git a/Lib/fnmatch.py b/Lib/fnmatch.py
index 73acb1fe8d4..dc9bf122e39 100644
--- a/Lib/fnmatch.py
+++ b/Lib/fnmatch.py
@@ -149,11 +149,14 @@ def _translate(pat, STAR, QUESTION_MARK):
                         stuff = '\\' + stuff
                     add(f'[{stuff}]')
         else:
-            add(re.escape(c))
+            add(_escape_char(c))
     assert i == n
     return res
 
 
+_escape_char = functools.lru_cache(maxsize=32768)(re.escape)
+
+
 def _join_translated_parts(inp, STAR):
     # Deal with STARs.
     res = []

@picnixz
Copy link
Member Author
picnixz commented Aug 17, 2024

It could be, I didn't check this alternative. Also, we could simply make it faster by a simple dictionary lookup because the implementation is private and we're supposed to have only strings being passed. Alternatively, we could inline the escaping directly. I'll try your suggestion today.

@picnixz
Copy link
Member Author
picnixz commented Aug 17, 2024

Huh, so your approach is actually the best (the cache is cleared before each benchmark but since there are many iterations for each row, the first call which caches the result is probably negligible).

+------------------------------------------+-----------------------+-----------------------+
| Benchmark                                | fnmatch-translate-ref | fnmatch-translate-py  |
+==========================================+=======================+=======================+
| abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!]  | 6.09 us               | 3.99 us: 1.53x faster |
+------------------------------------------+-----------------------+-----------------------+
| !abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!] | 6.39 us               | 4.07 us: 1.57x faster |
+------------------------------------------+-----------------------+-----------------------+
| a**?**cd**?**??k***                      | 2.24 us               | 1.51 us: 1.49x faster |
+------------------------------------------+-----------------------+-----------------------+
| a/**/b/**/c                              | 1.97 us               | 1.12 us: 1.76x faster |
+------------------------------------------+-----------------------+-----------------------+
| man/man1/bash.1                          | 3.00 us               | 1.21 us: 2.48x faster |
+------------------------------------------+-----------------------+-----------------------+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**          | 5.40 us               | 3.33 us: 1.62x faster |
+------------------------------------------+-----------------------+-----------------------+
| Geometric mean                           | (ref)                 | 1.71x faster          |
+------------------------------------------+-----------------------+-----------------------+

There are significant performance gains for each runs. I totally forgot about having the possibility of caching re.escape. I'll try to even reduce the memory cost by simply inlining re.escape. (no need to have that much micro-optimization; it will be hard to maintain so let's keep your solution)

@picnixz
Copy link
Member Author
picnixz commented Aug 17, 2024

I'll also update the C implementation. I discovered two other optimizations that I could on the C side but I think the C implementation is not that worth it unless you want a x3 improvement (here we are roughly x1.75, but I think we can go up to x2.8-x3.5 in C).

EDIT: We eventually were able to gain a factor 6 improvement but since the maintaince cost is high (I would be the one responsible for it probably and I should confess that I'd prefer maintining the Python implementation), I decided to give up on the C implementation.

barneygale added a commit that referenced this issue Nov 27, 2024
Improve performance of this function by a factor of 1.7x.

Co-authored-by: Barney Gale <barney.gale@gmail.com>
@picnixz
Copy link
Member Author
picnixz commented Nov 27, 2024

Closing since completed.

@picnixz picnixz closed this as completed Nov 27, 2024
ebonnal pushed a commit to ebonnal/cpython that referenced this issue Jan 12, 2025
…122289)

Improve performance of this function by a factor of 1.7x.

Co-authored-by: Barney Gale <barney.gale@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants
0