-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Inline assembler/c code (ESP32). #16594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In theory the ESP8266 inline assembler code could be repurposed for this (I believe the opcodes are almost the same between the ESP8266 and ESP32), but the way function calls are handled is quite different [1]. I'm not an expert on Xtensa matters, so I can't say whether supporting the windowed ABI is easy or not, maybe somebody with direct Xtensa experience can chime in with more detailed information. This looks interesting though, maybe I can give it a try once I get some free time for non-RISCV work. [1] See also https://sachin0x18.github.io/posts/demystifying-xtensa-isa/#calling-convention |
@agatti I would love it if you found the time to start work on this. I studied the ESP8266 inline assembler for using it in my CRC code. There the benchmarks showed that the Xtensa architecture is on par with ARM wrt. performance. I was wondering what prevented the completion of the inline assembler for the ESP architecture. I guess it was the necessity to prioritize other work at the time. Otherwise at least the 8266 assembler would be completed.
The ESP32 viper emitter emits code according to this ABI. So from my limited understanding this problem seems to be solved already. It appears as if the completion of the inline assembler was stepped over. Now as you completed the inline assembler for RISCV the Xtensa architecture is the only important one with missing inline assembler. I would love to help in development, testing and documentation on this. But didn't start to work on it myself, because I had the feeling that it is above my level of expertice. |
Viper code doesn't need to handle function calls if I recall correctly, that's the native emitter. The native emitter does indeed have code for windowed function prologues/epilogues but the inline assembler doesn't use it, it has its own thing (same for the Thumb and RV32 inline assembler implementations). I'll need to read up on this - it doesn't look too hard but I haven't written much Xtensa assembler myself.
Looks like it. I've taken a look, it shouldn't be too difficult to add the remaining opcodes. It won't look as nice and regular as the RV32 assembler (I'm biased :)), but the 8266 has tighter space requirements so compromises were made back then and will still need to be made going forward.
If you do, you may want to also consider merging #16524 to your local source tree - it contains a couple of fixes that aren't super-urgent right now but can help nonetheless.
Adding opcodes is not that difficult - the complicated work has already been done (and it's usually figuring out opcode encodings and how to handle relative offsets for jumps and branches), but I understand if you prefer delegating this to others. Still, I'll ping you once I have something running so you can test the result if you feel like it - mostly to figure out which opcodes are present on the ESP32 and which ones aren't on the 8266. From what I understood Xtensa doesn't really have a minimum set of opcodes that must be present, unlike RISC-V profiles, and nothing stops you from synthesizing an Xtensa core having a partial implementation of a specific profile for example. |
@rkompass I'm almost done with the core architecture opcodes, still need to figure out how the PC-relative load/store instructions need to be properly emitted (mixed 2 and 3 bytes instructions but with word-alignment constraints...), but they're implemented still. If you want to try that out, feel free to take a look at https://github.com/agatti/micropython/tree/xtensa-inline (no guarantees this will stay here or I won't force push stuff in there though). |
Congratulations @agatti ! |
With a different search I today saw this PR 5082. |
Thanks for the link. I'll tackle ESP32-supported opcodes when/if the ESP8266 PR is merged (or if the maintainers prefer to also have ESP32 support to warrant a merge). I also do not own an ESP32-S3 board to test the LX7-only opcodes on, so that bit will have to wait. Right now I explicitly tested things on a random ESP8266 board I repurposed from another project, and the official documentation from Cadence doesn't tell whether an opcode is LX6 or LX7 specific so I had to literally assemble the whole lot with GNU as and .byte statements and then discard the opcodes that triggered a crash on the board I had :) |
Haha.:-) Looking again into the XTENSA ISA Reference Manual I'm impressed by the many many options for architectural additions.
How difficult would it be to activate the '@micropython.asm_xtensawin' decorator option? Independent from the implementation of additional opcodes? If you need help in writing documentation: I should be able to do some work here.. |
I can understand not adding opcodes for address generation (addx2/subx2/...), leaving out some arithmetic opcodes, or not having the full set of branches, as you can easily write equivalent assembler code without them and minimise the MicroPython footprint taken by the feature. Not having shifts, on the other hand... Also, I've had to skip a few opcodes due to limitations of the native emitter infrastructure - forward-only jump opcodes aren't implemented.
Then don't look at RISC-V, you may get dizzy with how many options/extensions you'll find :). Xtensa processors are meant to be heavily customised at a very deep level depending on specific use cases, so it doesn't make much sense to spend logic gates on option availability/feature presence detection - you already know what you're running on. As far as I know, LX7 is a superset of LX6 (
Not at all, but I don't think that is really useful to split decorators between "xtensa" and "xtensawin". I'd say it makes more sense to have a single "xtensa" decorator and then check at runtime what you're running on/targeting (see The rationale for waiting on maintainers for me to move forward is that by not bringing in a full LX6 + LX7 inline assembler implementation with tests, documentation changes, esp32 port changes, and changes to test scripts in a single monolithic chunk, it would take less time for the maintainers to review the code and suggest changes to be made, thus establishing a baseline on how the Xtensa code should look like. Further changes would follow the guidelines implemented in the previous PR and take less review time in the long run. If the maintainers say they're OK with a single ESP8266+ESP32 PR then I'll update the whole lot, though.
If that PR is going to be merged, then I doubt the maintainers would reject additional documentation. Extra tests for new opcodes once they're added wouldn't hurt either, but let's see what the maintainers think first. Thanks for the offer! |
Description
It would be great to have an inline assembler or a way to pass machine code (already precompiled).
Python is great but quite slow for many timing sensitive applications.
It would be also useful to add features on the fly.
I see two possible implementations:
The classic one asm("nop; nop;")
Or a byte array containing the code.
Code Size
Shouldn't be much.
Implementation
I hope the MicroPython maintainers or community will implement this feature
Code of Conduct
Yes, I agree
The text was updated successfully, but these errors were encountered: