-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[MNT]: mathtext.MathTextParser is slow #20821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Effectively re-writing the mathtext module does not seem a small project (as I assume the output of the parsing step will be different enough that we will have to re-write the code that consumes it). It might be a good GSOC project? A less invasive fix would might to add some strategic If your text is not changing frequently you may also get a speed win (at the cost of an expensive first draw) by switching to |
It is probably not easy to replace the parser, true. The hot-fix for my app was to remove all LaTeX, which produced a perceivable performance gain. Some of the LaTeX was static, some was dynamically changing. I did not try to remove only the dynamic components which cannot be cached. |
I'd be happy to work with a maintainer on possible performance improvements in this parser. |
That would be very appreciated @ptmcg ! Our latex parsing code is in https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/_mathtext.py and for parsing font config strings in https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/fontconfig_pattern.py. I suspect that there is a lot more opportunities for optimization in the latex parsing code, but the font config code is more frequently run (but could be wrong about both of those). |
Thanks! From a quick profiling, I guess it may help a bit to replace the custom caching mechanism (_parseCache) by lru_cache, although things are getting complicated by the presence of debugActions which always get executed (so one would need to maintain the old path in the presence of debugActions, and only enable the lru_cache-based one in their absence?). |
Is the custom caching even being used? I don't see enablePackrat being called anywhere. |
matplotlib/lib/matplotlib/_mathtext.py Line 30 in d358cc3
|
My mistake, I was looking at an extract from _mathtext.py that I was using to test in isolation, and had left out that line 🤦. Replacing the custom cache in pyparsing with lru_cache is probably not going to be helpful, as lru_cache won't cache exceptions and pyparsing expressions raise exceptions far more often than they return matched tokens. I looked at the font config parser, and it is pretty simple. I don't see any obvious places for improvement. In
Some things not to do:
Just to set expectations: In general, I've found my pyparsing tweaking to have at best 10-30% improvement, no orders of magnitude boosts (other than when adding packrat parsing to a recursive grammar, especially one using I've also attached PDF and HTML versions of a railroad diagram for this parser. From the generated regexes and expression names for the |
Thank you very much for the many detailed suggestions. In the following, I use As a side point, it seems a pity that literal strings each generate their own Literal instance that is not shared. I guess in theory they are mutable (one could re-access the nodes and set different parse actions on them), but perhaps (just armchair-quarterbacking here) it may make sense to declare that literal strings generate immutable Literal instances and dedupe them? |
Ouch! I misspoke. I can see why I did not do this before, in that |
3.0.2 has been pushed out, please try it, your |
Thanks for the suggestion, but I guess I'll probably just go for the good-old-regex route (discussion starting at #21454 (comment)). In any case that'll probably wait until after 3.5 is release to settle things down. |
Also, AFAICS restoring explicitly reused Literals for braces and backslashes on top of #21448 is actually slower (by ~5% on #21448 (comment)). |
Notes to self:
|
Summary
The mathtext.MathTextParser is a performance bottleneck.
I profiled my interactive app that uses PyQt and matplotlib to draw complex plots with very little text and yet significant time (20 %) is spend in the mathtext.MathTextParser. See
profile_graph.pdf
Proposed fix
The parser is using pyparsing internally and could potentially be speed up by switching to lark.
The text was updated successfully, but these errors were encountered: