Speed up union simplification #12526

JukkaL · 2022-04-05T10:36:22Z

Union simplification (make_simplified_union) has been causing multiple performance issues (at least #9169, #12408, #12225). It can make proper subtype checks of all union items against all other items, which is O(n**2) -- with certain O(n) fast paths that cover some (but not all) problematic scenarios. Union simplification is fairly performance-critical even when we don't hit worst-case scenarios.

Here are some ideas about what we might do to improve the situation:

Somehow implement union simplification of multiple Instance types (at least simple ones) in close to linear time. I suspect that this is possible under some reasonable assumptions.
Cache negative results of proper subtype checks. I think that currently we only cache positive results (in mypy.typestate). This might have some drawbacks, such as a possible explosion of cache sizes. I assume there's a reason why we aren't currently doing this. Union simplification tends to perform many proper subtype checks with negative results.
Avoid doing full union simplification in some cases, perhaps based on some heuristics. Union simplification should never be semantically necessary.
Add fast paths for the most common union simplification operations (e.g. single item, X | None).

The text was updated successfully, but these errors were encountered:

erictraut · 2022-04-05T16:52:41Z

Every large union case that I've observed in real code involves large numbers of literals — either str or int literals. All of the problems cited above appear to follow this pattern.

In pyright, I mitigate this problem by maintaining maps of str and int literal values within a union. I use these two maps to avoid the O(n**2) behavior for union operations that involve literal values. So this combines aspects of your proposal 1 and 4.

JukkaL · 2022-04-06T15:00:18Z

Every large union case that I've observed in real code involves large numbers of literals — either str or int literals. All of the problems cited above appear to follow this pattern.

The colour-science package also uses medium-sized unions (e.g. ~10 items) without any literals, which are slow enough to cause trouble (#12225). The unions contain protocol types that are particularly slow to process by mypy. Caching negative proper subtype checks involving protocol types could already help quite a bit with colour-science.

In pyright, I mitigate this problem by maintaining maps of str and int literal values within a union.

Mypy does something a bit similar in a few places. However, we decompose the union types into literals vs non-literals separately on every operation. This sounds a bit different from what pyright does, and the pyright approach sounds more efficient. We could consider adopting this approach if smaller optimizations don't seem enough.

JukkaL · 2022-04-06T15:06:01Z

Here's one more idea:

We could cache the results of make_simplified_union operations, since sometimes we repeatedly simplify the exact same types over and over again. We may need to flush items from the cache (e.g. using LRU caching) to avoid using lots of memory in worst case scenarios.

Mypyc is bad at compiling nested functions. In one use case we were spending 7% of the mypy runtime just creating closure objects for `check_argument`. Here I manually inline the nested function to avoid this overhead. Work on #12526.

This shows up as a bottleneck in some use cases, based on profiling. This should help with union simplification (#12526).

It can be a bottleneck in some use cases. Work on #12526.

This tweaks a change made in #12539 that may have slowed things down. The behavior introduced in the PR was more correct, but it's not worth a potential major performance regression, since union simplification is not something we have to get always right for correctness. Work on #12526.

This is very performance critical. Implement a few micro-optimizations to speed caching a bit. In particular, we use dict.get to reduce the number of dict lookups required, and avoid tuple concatenation which tends to be a bit slow, as it has to construct temporary objects. It would probably be even better to avoid using tuples as keys altogether. This could be a reasonable follow-up improvement. Avoid caching if last known value is set, since it reduces the likelihood of cache hits a lot, because the space of literal values is big (essentially infinite). Also make the global strict_optional attribute an instance-level attribute for faster access, as we might now use it more frequently. I extracted the cached subtype check code into a microbenchmark and the new implementation seems about twice as fast (in an artificial setting, though). Work on #12526 (but should generally make things a little better).

This tweaks a change made in #12539 that may have slowed things down. The behavior introduced in the PR was more correct, but it's not worth a potential major performance regression, since union simplification is not something we have to get always right for correctness. Work on #12526.

Mypyc is bad at compiling nested functions. In one use case we were spending 7% of the mypy runtime just creating closure objects for `check_argument`. Here I manually inline the nested function to avoid this overhead. Work on #12526. Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>

make_simplified_union is used in a lot of places and therefore accounts for a significant share to typechecking time. Based on sample metrics gathered from a large real-world codebase we can see that: 1. the majority of inputs are already as simple as they're going to get, which means we can avoid allocation extra lists and return the input unchanged 2. most of the cost of `make_simplified_union` comes from `is_proper_subtype` 3. `is_proper_subtype` has some caching going on under the hood but it only applies to `Instance`, and cache hit rate is low in this particular case because, as per 1) above, items are in fact rarely subtypes of each other To address 1, refactor `make_simplified_union` with an optimistic fast path that avoid unnecessary allocations. To address 2 & 3, introduce a cache to record the result of union simplification. These changes are observed to yield significant improvements in a real-world codebase: a roughly 10-20% overall speedup, with make_simplified_union/is_proper_subtype no longer showing up as hotspots in the py-spy profile. For python#12526

huguesb · 2022-04-23T05:53:17Z

Here's one more idea:

5. We could cache the results of `make_simplified_union` operations, since sometimes we repeatedly simplify the exact same types over and over again. We may need to flush items from the cache (e.g. using LRU caching) to avoid using lots of memory in worst case scenarios.

I have a PR doing just that. I'm not too worried about the cache growing too large. I added some logging of calls to make_simplified_union(and their inputs) while running over a large codebase (>20k files) and used sort/uniq -c to get a sense of the potential caching benefits. I saw ~70k calls to make_simplified_union but only ~7k distinct inputs/outputs, which seems like a pretty reasonable size for a cache. We could keep that cache even smaller by special-casing inputs with only one or two items.

Overall performance improvement seems to be around 10-20% for this particular codebase. Fwiw, the bulk of the slowness comes from inputs with ~10 Type[]

make_simplified_union is used in a lot of places and therefore accounts for a significant share to typechecking time. Based on sample metrics gathered from a large real-world codebase we can see that: 1. the majority of inputs are already as simple as they're going to get, which means we can avoid allocation extra lists and return the input unchanged 2. most of the cost of `make_simplified_union` comes from `is_proper_subtype` 3. `is_proper_subtype` has some caching going on under the hood but it only applies to `Instance`, and cache hit rate is low in this particular case because, as per 1) above, items are in fact rarely subtypes of each other To address 1, refactor `make_simplified_union` with an optimistic fast path that avoid unnecessary allocations. To address 2 & 3, introduce a cache to record the result of union simplification. These changes are observed to yield significant improvements in a real-world codebase: a roughly 10-20% overall speedup, with make_simplified_union/is_proper_subtype no longer showing up as hotspots in the py-spy profile. For python#12526

JukkaL · 2022-04-25T15:46:27Z

I have a PR doing just that. I'm not too worried about the cache growing too large.

It seems like it would be trivial to just clear the cache if we happen to hit an unexpected scenario where the cache grows too large. Normally we wouldn't need this, but if we do, the cost of occasionally clearing the cache is probably still fairly minor.

Overall performance improvement seems to be around 10-20% for this particular codebase.

This is a great result! I did some profiling and I believe that non-trivial performance improvements should be fairly common (at least in the 1-10% range), and in some cases the impact could be even higher than 20%.

However, I also noticed that our current Type hashing implementation doesn't seem to work quite right for the caching (see #12659 (comment)). The deviations that might happen will probably be very rare, but if they happen, they could be extremely confusing, so I think it's important to address the correctness of hashing first.

From the above comment, these are main issues I've found:

Some Type subclasses seem to have incorrect/missing/incomplete __eq__ and/or __hash__ methods. This should be pretty easy to fix and are just bugs. It would be good to also have some unit tests for these.
The line and column attributes of Type are not using in type equality/hashing, but they might plausibly affect type checking results in edge cases (e.g. locations of errors).
The can_be_true and can_be_false attributes are ignored in equality/hashing, but they might plausibly affect type checking.

(1) should be easy to fix. Somebody just needs to review all __eq__ and __hash__ methods and ensure that they exist for all concrete subclasses of Type and they cover all the important attributes.

(2) Is more tricky. One option might be to remove the uses of line and column attributes during type checking, i.e. they'd only be used during semantic analysis, when we shouldn't perform any union simplification. Another option would be remove line and column attributes from most Type subclasses, except the "syntactic" ones such as UnboundType where they are required for error reporting. I'm not sure how feasible this would be. We'd need to track the line numbers separately where they are needed.

(3) This could be fixed by just including can_be_true and can_be_false in all __eq__ and __hash__ methods. A better fix might to move these away from type objects altogether and store them in some other way, but this may be quite difficult to achieve, and potentially not worth the effort.

huguesb · 2022-04-30T04:30:14Z

As mentioned in #12659

Definitely needs fixing. Do you have more details on which classes are affected?
Can easily be addressed, I think, by removing the line/column parameters of make_simplified_union. Most callers of make_simplified_union do not supply them, and updating those that do to omit those parameters results in a green CI run (including mypy_primer): see EXPERIMENT remove line/column param to make_simplified_union #12698
If the callers of make_simplified_union do not care about line/column then neither should the cache...
I'm not quite sure what to make of that right now. The semantics of those attributes are confusing me and I will need to take a closer look at them.

JukkaL added the performance label Apr 5, 2022

AlexWaygood added the refactoring Changing mypy's internals label Apr 6, 2022

JukkaL mentioned this issue Apr 6, 2022

Significant speed degradation from 0.931 to 0.941 #12408

Closed

JukkaL mentioned this issue Apr 7, 2022

Minor proper subtype check optimization #12536

Merged

JukkaL added a commit that referenced this issue Apr 7, 2022

Speed up Instance.__hash__

13494d7

This shows up as a bottleneck in some use cases, based on profiling. This should help with union simplification (#12526).

JukkaL mentioned this issue Apr 7, 2022

Speed up Instance.__hash__ #12538

Merged

97littleleaf11 pushed a commit that referenced this issue Apr 7, 2022

Speed up Instance.__hash__ (#12538)

ab1b488

This shows up as a bottleneck in some use cases, based on profiling. This should help with union simplification (#12526).

JukkaL mentioned this issue Apr 7, 2022

Speed up caching of subtype checks #12539

Merged

JukkaL added a commit that referenced this issue Apr 7, 2022

Speed up LiteralType.__hash__

73e3ef4

It can be a bottleneck in some use cases. Work on #12526.

JukkaL mentioned this issue Apr 7, 2022

Further speed up union simplification #12541

Merged

huguesb mentioned this issue Apr 23, 2022

make_simplified_union: add caching and reduce allocations #12659

Open

hauntsaninja mentioned this issue May 26, 2022

mypy runs forever with colour-science #12225

Closed

pranavrajpal mentioned this issue May 31, 2022

Refactor mypy to use query-based architecture #12911

Open

hauntsaninja mentioned this issue Jul 17, 2022

Regression from 0.750 - Literal types cause significant slowdown #9169

Closed

AlexWaygood mentioned this issue May 7, 2025

Add a benchmark involving realistic code that creates large unions astral-sh/ty#240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed up union simplification #12526

Speed up union simplification #12526

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Speed up union simplification #12526

Speed up union simplification #12526

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!