-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Allow compressed displacement to be used when profitable #117288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds logic in emitter::TakesEvexPrefix to detect when a memory displacement can be compressed to 8 bits via EVEX and only emits the prefix when profitable, saving encoding space.
- Compute stack or address displacement and check if it fits in a signed byte
- Call TryEvexCompressDisp8Byte for larger displacements and return true when compression succeeds
Comments suppressed due to low confidence (1)
src/coreclr/jit/emitxarch.cpp:1926
- No tests were added to verify the new compressed displacement logic (TryEvexCompressDisp8Byte); consider adding unit tests covering both compressible and non-compressible displacement cases to ensure correct behavior.
dsp = TryEvexCompressDisp8Byte(id, dsp, &dspInByte);
5fcafd8
to
5344cbd
Compare
4f30736
to
a96769c
Compare
a96769c
to
15d599c
Compare
15d599c
to
c017051
Compare
25acfbf
to
e75ba7f
Compare
e75ba7f
to
dfc08c9
Compare
CC. @dotnet/jit-contrib This is generally ready for review and the diffs look overall good. Linux barely has any asm diffs due to having no callee saved SIMD registers and different passing semantics. This causes its spilling behaviors and general usage patterns to differ, so there aren't as many opportunities. However, it has significant TP improvements because the relevant checks are no longer happening when they can't benefit Windows however has significant asm diffs and it has improved TP in cases where there is little to no SIMD usage or no ability to use compressed-displacement, but a TP hit (namely in the SIMD heavy tests/benchmarks) when it does. Showcasing that the feature is pay for play.
|
182f73e
to
9d66a2b
Compare
9d66a2b
to
c8df8ef
Compare
LinuxASM Diffs:
Throughput Impact:
WindowsASM Diffs:
Throughput Impact
For most cases, we see reductions in the actual instruction groups because we are now choosing to emit the In a few cases, however, we see no change to the For a few methods, the reduction in There are a couple of Windows x86 sees less improvement due to it not having |
…ARGS isn't available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Wo 8000 rth running some outerloops?
/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr jitstressregs |
Azure Pipelines successfully started running 5 pipeline(s). |
8000
/azp run runtime-coreclr jitstress-isas-x86, Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr jitstressregs |
Azure Pipelines successfully started running 5 pipeline(s). |
This should save 1-byte of encoding space as compared to using imm32, when applicable.