-
Notifications
You must be signed in to change notification settings - Fork 3.1k
WIP mapOver/traverse/transform: refactor pattern match into virtual call [ci: last-only] #5906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This appears to perform better than the pattern match. Possible factors at play: - vtable lookup is constant time vs pattern match requires a cascade of instanceof-s - megamorphic invokevirtual has more stable JIT profiles than a the match (parts of the match can appear to be dead code depending on what phase of the compiler JIT profiles the method.) - The new, small methods might in more inlinable into callers that are effectively mono morphic.
…ixin" This reverts commit c2dc346.
I am very interested to know what are the underlying reasons | attributions for this performance improvement. I thought that pattern matching and virtual call were more or less on par. I'm surprised to see this is not the case! |
Possible factors at play:
|
I believe that the relative inefficiencies of pattern matching don't matter in 99% of cases. To help understand when we're in the 1% of cases, we should create some micro-benchmarks that, e.g. increase the number of cases, or provide input data in patterns that change periodically (akin to the way that certain AST nodes aren't around in later compiler phases) that can fool JIT into overfitting to short term patterns and lead to excessive deoptimizations. |
|
The test failures are likely due to null types or trees appearing in places where I've removed null checks in favour of |
I'm acquainting myself with the profiler in Oracle Developer Studio to get an additional perspective things. This gives lets you see profiles at finer granularity (per bytecode, or even compiled x86 instruction), and also lets you overlay hardware counter data. I haven't spent long enough looking to draw many conclusions, but I thought I'd share it's view on the hotspots. I ran the compiler benchmark using #5907 (a merge of my fruitful optimizations to date), let it warm up for 200s, and collected profile for around 300s. Top methods, including |
Pushing the WIP branch here for the record.
The performance gains of ~2% need further attribution to individual commits
and analysis of the underlying reasons.