8000 Allow compressed displacement to be used when profitable by tannergooding · Pull Request #117288 · dotnet/runtime · GitHub
[go: up one dir, main page]

Skip to content

Allow compressed displacement to be used when profitable #117288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 7, 2025

Conversation

tannergooding
Copy link
Member

This should save 1-byte of encoding space as compared to using imm32, when applicable.

@Copilot Copilot AI review requested due to automatic review settings July 3, 2025 19:20
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 3, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds logic in emitter::TakesEvexPrefix to detect when a memory displacement can be compressed to 8 bits via EVEX and only emits the prefix when profitable, saving encoding space.

  • Compute stack or address displacement and check if it fits in a signed byte
  • Call TryEvexCompressDisp8Byte for larger displacements and return true when compression succeeds
Comments suppressed due to low confidence (1)

src/coreclr/jit/emitxarch.cpp:1926

  • No tests were added to verify the new compressed displacement logic (TryEvexCompressDisp8Byte); consider adding unit tests covering both compressible and non-compressible displacement cases to ensure correct behavior.
            dsp = TryEvexCompressDisp8Byte(id, dsp, &dspInByte);

@tannergooding
Copy link
Member Author
tannergooding commented Jul 5, 2025

CC. @dotnet/jit-contrib This is generally ready for review and the diffs look overall good.

Linux barely has any asm diffs due to having no callee saved SIMD registers and different passing semantics. This causes its spilling behaviors and general usage patterns to differ, so there aren't as many opportunities. However, it has significant TP improvements because the relevant checks are no longer happening when they can't benefit

Windows however has significant asm diffs and it has improved TP in cases where there is little to no SIMD usage or no ability to use compressed-displacement, but a TP hit (namely in the SIMD heavy tests/benchmarks) when it does. Showcasing that the feature is pay for play.

There are notably a couple size regressions I'm looking at, since we shouldn't have any places that are now allocating "more" bytes than before. But this is likely a minor tweak to the current code. Figured this out. emitInsSizeSV uses UNATIVE_OFFSET for the offs calculation, while emitOutputSV uses ssize_t. We need to preserve the cast to int in emitInsSizeSV to ensure the relevant sign extension occurs and we check the right value. -- This should probably be rewritten to use ssize_t, like other places in emitxarch do, particularly since it would allow simplifying some of the other logic; but that should be a follow up PR

@tannergooding
Copy link
Member Author

Linux

ASM Diffs:

Overall (-148 bytes)
FullOpts (-148 bytes)

Throughput Impact:

Overall (-0.24% to -0.06%)
MinOpts (-0.59% to -0.35%)
FullOpts (-0.15% to -0.06%)

Windows

ASM Diffs:

Overall (-1,547,961 bytes)
MinOpts (-1,246,735 bytes)
FullOpts (-301,226 bytes)

Throughput Impact

Overall (-0.17% to +0.03%)
MinOpts (-0.63% to +0.13%)
FullOpts (-0.17% to +0.02%)

For most cases, we see reductions in the actual instruction groups because we are now choosing to emit the EVEX encoding to get the 1-byte savings from using compressed displacement where profitable.

In a few cases, however, we see no change to the Total bytes of code but rather a reduction in the allocated bytes for code instead. This is because we are more correctly predicting how many bytes are required which allows each method to allocate less.

For a few methods, the reduction in Total bytes of code also results in further savings because many jumps can become SHORT where they previously were not.

There are a couple of T0 methods where the total bytes of code has regressed (gone up). This is due to the quirks mentioned above in emitInsSizeSV. A separate PR should be done to clean that up and make it simpler/more inline with how emitOutputSV is actually consuming the prediction.

Windows x86 sees less improvement due to it not having FEATURE_FIXED_OUT_ARGS. This causes it to need to dynamically adjust the offset based on the emitCurStackLvl. There's some quirks in how the SV paths handle this, mentioned above, and just some general complexity in this blocking the ability to decide if compressed displacement can be used up front which makes the feature not very pay for play. As such, we pessimize to not using compressed displacement in such scenarios so there are a few more methods with regressions (but overall still by far an improvement).

Copy link
Member
@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Wo 8000 rth running some outerloops?

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr jitstressregs

Copy link
Azure Pipelines successfully started running 5 pipeline(s).

@tannergooding
Copy link
Member Author
8000

/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr jitstressregs

Copy link
Azure Pipelines successfully started running 5 pipeline(s).

@tannergooding tannergooding merged commit c2dbdc9 into dotnet:main Jul 7, 2025
155 of 174 checks passed
@tannergooding tannergooding deleted the preferDisp8 branch July 7, 2025 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0