-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Speed up union simplification #12526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Every large union case that I've observed in real code involves large numbers of literals — either In pyright, I mitigate this problem by maintaining maps of |
The colour-science package also uses medium-sized unions (e.g. ~10 items) without any literals, which are slow enough to cause trouble (#12225). The unions contain protocol types that are particularly slow to process by mypy. Caching negative proper subtype checks involving protocol types could already help quite a bit with colour-science.
Mypy does something a bit similar in a few places. However, we decompose the union types into literals vs non-literals separately on every operation. This sounds a bit different from what pyright does, and the pyright approach sounds more efficient. We could consider adopting this approach if smaller optimizations don't seem enough. |
Here's one more idea:
|
Mypyc is bad at compiling nested functions. In one use case we were spending 7% of the mypy runtime just creating closure objects for `check_argument`. Here I manually inline the nested function to avoid this overhead. Work on #12526.
Mypyc is bad at compiling nested functions. In one use case we were spending 7% of the mypy runtime just creating closure objects for `check_argument`. Here I manually inline the nested function to avoid this overhead. Work on #12526.
This shows up as a bottleneck in some use cases, based on profiling. This should help with union simplification (#12526).
This shows up as a bottleneck in some use cases, based on profiling. This should help with union simplification (#12526).
It can be a bottleneck in some use cases. Work on #12526.
This is very performance critical. Implement a few micro-optimizations to speed caching a bit. In particular, we use dict.get to reduce the number of dict lookups required, and avoid tuple concatenation which tends to be a bit slow, as it has to construct temporary objects. It would probably be even better to avoid using tuples as keys altogether. This could be a reasonable follow-up improvement. Avoid caching if last known value is set, since it reduces the likelihood of cache hits a lot, because the space of literal values is big (essentially infinite). Also make the global strict_optional attribute an instance-level attribute for faster access, as we might now use it more frequently. I extracted the cached subtype check code into a microbenchmark and the new implementation seems about twice as fast (in an artificial setting, though). Work on #12526 (but should generally make things a little better).
Mypyc is bad at compiling nested functions. In one use case we were spending 7% of the mypy runtime just creating closure objects for `check_argument`. Here I manually inline the nested function to avoid this overhead. Work on #12526. Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
make_simplified_union is used in a lot of places and therefore accounts for a significant share to typechecking time. Based on sample metrics gathered from a large real-world codebase we can see that: 1. the majority of inputs are already as simple as they're going to get, which means we can avoid allocation extra lists and return the input unchanged 2. most of the cost of `make_simplified_union` comes from `is_proper_subtype` 3. `is_proper_subtype` has some caching going on under the hood but it only applies to `Instance`, and cache hit rate is low in this particular case because, as per 1) above, items are in fact rarely subtypes of each other To address 1, refactor `make_simplified_union` with an optimistic fast path that avoid unnecessary allocations. To address 2 & 3, introduce a cache to record the result of union simplification. These changes are observed to yield significant improvements in a real-world codebase: a roughly 10-20% overall speedup, with make_simplified_union/is_proper_subtype no longer showing up as hotspots in the py-spy profile. For python#12526
I have a PR doing just that. I'm not too worried about the cache growing too large. I added some logging of calls to Overall performance improvement seems to be around 10-20% for this particular codebase. Fwiw, the bulk of the slowness comes from inputs with ~10 |
make_simplified_union is used in a lot of places and therefore accounts for a significant share to typechecking time. Based on sample metrics gathered from a large real-world codebase we can see that: 1. the majority of inputs are already as simple as they're going to get, which means we can avoid allocation extra lists and return the input unchanged 2. most of the cost of `make_simplified_union` comes from `is_proper_subtype` 3. `is_proper_subtype` has some caching going on under the hood but it only applies to `Instance`, and cache hit rate is low in this particular case because, as per 1) above, items are in fact rarely subtypes of each other To address 1, refactor `make_simplified_union` with an optimistic fast path that avoid unnecessary allocations. To address 2 & 3, introduce a cache to record the result of union simplification. These changes are observed to yield significant improvements in a real-world codebase: a roughly 10-20% overall speedup, with make_simplified_union/is_proper_subtype no longer showing up as hotspots in the py-spy profile. For python#12526
It seems like it would be trivial to just clear the cache if we happen to hit an unexpected scenario where the cache grows too large. Normally we wouldn't need this, but if we do, the cost of occasionally clearing the cache is probably still fairly minor.
This is a great result! I did some profiling and I believe that non-trivial performance improvements should be fairly common (at least in the 1-10% range), and in some cases the impact could be even higher than 20%. However, I also noticed that our current From the above comment, these are main issues I've found:
(1) should be easy to fix. Somebody just needs to review all (2) Is more tricky. One option might be to remove the uses of (3) This could be fixed by just including |
As mentioned in #12659
|
Union simplification (
make_simplified_union
) has been causing multiple performance issues (at least #9169, #12408, #12225). It can make proper subtype checks of all union items against all other items, which is O(n**2) -- with certain O(n) fast paths that cover some (but not all) problematic scenarios. Union simplification is fairly performance-critical even when we don't hit worst-case scenarios.Here are some ideas about what we might do to improve the situation:
Instance
types (at least simple ones) in close to linear time. I suspect that this is possible under some reasonable assumptions.mypy.typestate
). This might have some drawbacks, such as a possible explosion of cache sizes. I assume there's a reason why we aren't currently doing this. Union simplification tends to perform many proper subtype checks with negative results.X | None
).The text was updated successfully, but these errors were encountered: