-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
py/emitinlinextensa: Add all supported LX3/LX106 opcodes to the inline assembler. #16731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #16731 +/- ##
=======================================
Coverage 98.54% 98.54%
=======================================
Files 169 169
Lines 21898 21898
=======================================
Hits 21579 21579
Misses 319 319 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code size report:
|
Hello @agatti, You noted in the tests:
I suggest the following correction for this in emitinlinextensa.c (line 472ff):
|
Right, turns out it's also what GAS does when assembling the SLLI opcode. Thanks for pointing it out, I'll fix it right away. |
cf2e027
to
ad02cf3
Compare
Thanks. Another one which is purely cosmetic:
in the tests should be titled lsr15, because you do the shift by 15 bits, right? |
Yep. SRLI only accepts from 1 to 15 as its shift parameter - good catch. I'll fix this after the current test run passes. Thanks! |
ad02cf3
to
1a49084
Compare
I had in my Preliminary asm_xtensa docs
which includes that srli is only shiftable by 0..15 bits (i.e. imm4). Consequently now the first test in xtensa_asmshift.py fails and produces
I ask myself why below "All checks have passed". Wouldn't that give an error? |
I've implemented } else if (op == MP_QSTR_slli) {
mp_uint_t r0 = get_arg_reg(emit, op_str, pn_args[0]);
mp_uint_t r1 = get_arg_reg(emit, op_str, pn_args[1]);
mp_uint_t bits = 32 - get_arg_i(emit, op_str, pn_args[2], 1, 31);
asm_xtensa_op24(&emit->as, ASM_XTENSA_ENCODE_RRR(0, 1, 0 | ((bits >> 4) & 0x01), r0, r1, bits & 0x0F)); The case for shifts larger than 15 is handled, the third argument of ASM_XTENSA_ENCODE_RRR (Sorry, I misread SLLI for SRAI, not enough coffee I guess) |
I added the snippet:
to the test file So the results seem to show that there is something wrong yet. Can you confirm that? |
Both documents' instruction descriptions match indeed. Still, thanks for the snippet, I'll look into it later today. |
(Just a quick test, I'll update this comment later.) The opcode generated for
And that opcode appears at 0x710a28d539f1:
|
You're calling |
Yeah. And I also checked the code. It is correct now. And the tests pass. I had the mistake that you found in it. It might be useful to add the test for asr for 24 bits anyway, because of the complications.
My translation of the encoding is correct but made things more complicated. I replaced the data statements by the slli and srli in my CRC code and the tests for correctness pass. |
1a49084
to
9b6ed4c
Compare
No worries, just pushed a new version with an additional ASR test. |
9b6ed4c
to
a8274c8
Compare
I left over some code that was used by opcodes that don't work on the 8266, now removed. Sorry! |
Thanks for this, it's a very nice enhancement. I tested it on an esp8266 and the new tests all pass. The code size increase for this PR is +2200 bytes on ESP8266_GENERIC. |
2.2KiB... I thought it took less space. I wonder if using a similar approach as what the RV32 assembler does but with one table per argument count would reduce the footprint - even by just a bit, as there's a lot of small duplicated code fragments that can be streamlined into fewer instances. The compiler used for the 8266 port is not that recent so maybe on ESP32 this is less of an issue, but the ESP32 has more space available. Upgrading the 8266 port to use a more recent compiler isn't something I'm looking forward to :) Given that the plan is to eventually submit a PR to make this feature available on ESP32 as well, would it make sense to add a PR to add a QEMU Xtensa port in order to be able to put the inline assembler (and natmods too) under CI? Regular QEMU seems to support a few Xtensa targets but I need to see which instruction sets they support, though. |
This is not a show stopper for this PR. If we are short on space in the future, we can look into optimisations then.
If that's not too much work/effort, then yes that would be a good idea. |
525cb18
to
13a659d
Compare
I believe this should be in its final form. Changes since last review round:
@rkompass If you want to start to work on documentation when you've got some time, that'd be great :) Anyway, once I'm done with Picolibc stuff for natmods I'll start working on ESP32/ESP32S3 (LX6+LX7) opcodes. I'll skip floating point opcodes as I don't think any other inline assembler implementation supports them anyway. |
@agatti I will start in a few days, hopefully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the comments above, this is looking very good now.
This commit removes old raw printf calls happening inside certain branch opcode emitters, indicating the target label is out of range for the opcode. They have been replaced with a RuntimeError being raised in these cases, using a parameterised qstr instead. Whilst this technically breaks runtime behaviour expectations, the generated code would not have worked anyway so it's better to catch those cases early. This should be updated to always emit long jumps unless jumps are backwards and short enough, following the other ports, but that's something coming later. This is actually needed because there are test files that do not work when processed through mpy-cross and entirely converted to native code. The original implementation would still generate mostly-valid code that was bound to crash on the device, whilst this change would prevent invalid code to even be emitted in the first place. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit fixes compilation errors occurring when enabling the Xtensa code dumper inside mpy-cross. The original code was meant to dump the code from an Xtensa device itself, but for debugging the inline assembler this functionality was also needed off-line. The changes involve solving a signed/unsigned mismatch that was not much of a problem for the 8266's gcc version but made modern compilers complain, and using the printf formatter for pointers when it comes to printing code addresses. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit expands the Xtensa inline assembler to support most if not all opcodes available on the ESP8266 and LX3 Xtensa cores. This is meant as a stepping stone to add inline assembler support for the ESP32 and its LX6 core, along to windowed-specific opcodes and additional opcodes that are present only on the LX7 core (ESP32-S3 and later). New opcodes being added are covered by tests, and the provided tests were expanded to also include opcodes available in the existing implementation. Given that the ESP8266 space requirements are tighter than ESP32's, certain opcodes that won't be commonly used have been put behind a define to save some space in the general use case. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit simplifies native functions' prologue code by not emitting a jump opcode that goes over the function's constants pool if the pool is empty. The original code assumed the constants pool is never empty as large 32-bits constants are commonly used, but for inline assembler functions that may not be the case. This meant that inline assembler functions may start with an unneeded jump (along with its alignment byte), using four bytes more than necessary. This commit is limited to the "xtensa" target, as "xtensawin" doesn't support inline assembler functions yet, so native functions' constant pools are almost always guaranteed to hold one or more values. Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with ESP32_GENERIC and ESP8266_GENERIC firmware. Works well.
Thank you!
Summary
This PR expands the Xtensa inline assembler to support most if not all opcodes available on the ESP8266 and LX3/LX106 Xtensa cores.
This is meant as a stepping stone to add inline assembler support for the ESP32 and its LX6 core, along to windowed-specific opcodes and additional opcodes that are present only on the LX7 core (ESP32-S3 and later). Rather than having a single monolithic PR with all sorts of assembler changes I chose to split this into ESP8266-only support first and then use this to add ESP32 support (see #16594).
New opcodes being added are covered by tests, and the provided tests were expanded to also include opcodes available in the existing implementation. Given that the ESP8266 space requirements are tighter than ESP32's, certain opcodes that won't be co 8000 mmonly used are put behind a define (
MICROPY_EMIT_INLINE_XTENSA_UNCOMMON_OPCODES
, disabled by default) to save some space in the general use case.Testing
The new inline assembler additions were tested on an ESP8266 with
./run-tests.py -t <device> -d inlineasm/xtensa
.Trade-offs and Alternatives
This increases the footprint of the ESP8266 firmware, as it has inline assembler enabled by default. I've tried to keep the size increase within reason but there's only so much you can do after all.
Also, some extmod tests would be emitted incorrectly when ran through
mpy-cross
to generate Xtensa/Xtensawin native code (some jumps are too far away to be encoded with a simple opcode) but now they fail at compile time rather than at runtime. This will be fixed in a later PR.