8000 Inline assembler/c code (ESP32). · Issue #16594 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

Inline assembler/c code (ESP32). #16594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
0wwafa opened this issue Jan 16, 2025 · 9 comments
Open

Inline assembler/c code (ESP32). #16594

0wwafa opened this issue Jan 16, 2025 · 9 comments
Labels
enhancement Feature requests, new feature implementations port-esp32

Comments

@0wwafa
Copy link
0wwafa commented Jan 16, 2025

Description

It would be great to have an inline assembler or a way to pass machine code (already precompiled).
Python is great but quite slow for many timing sensitive applications.
It would be also useful to add features on the fly.

I see two possible implementations:
The classic one asm("nop; nop;")
Or a byte array containing the code.

Code Size

Shouldn't be much.

Implementation

I hope the MicroPython maintainers or community will implement this feature

Code of Conduct

Yes, I agree

@0wwafa 0wwafa added the enhancement Feature requests, new feature implementations label Jan 16, 2025
@0wwafa 0wwafa changed the title Inline assembler/code. Inline assembler/code (ESP32). Jan 16, 2025
@agatti
Copy link
Contributor
agatti commented Jan 16, 2025

In theory the ESP8266 inline assembler code could be repurposed for this (I believe the opcodes are almost the same between the ESP8266 and ESP32), but the way function calls are handled is quite different [1].

I'm not an expert on Xtensa matters, so I can't say whether supporting the windowed ABI is easy or not, maybe somebody with direct Xtensa experience can chime in with more detailed information. This looks interesting though, maybe I can give it a try once I get some free time for non-RISCV work.


[1] See also https://sachin0x18.github.io/posts/demystifying-xtensa-isa/#calling-convention

@0wwafa 0wwafa changed the title Inline assembler/code (ESP32). Inline assembler/c code (ESP32). Jan 16, 2025
@rkompass
Copy link
rkompass commented Jan 17, 2025

This looks interesting though, maybe I can give it a try once I get some free time for non-RISCV work.

@agatti I would love it if you found the time to start work on this.

I studied the ESP8266 inline assembler for using it in my CRC code. There the benchmarks showed that the Xtensa architecture is on par with ARM wrt. performance.
But even the ESP8266 inline assembler is not complete yet. Some important instructions like bit shifts had to be done via data() statements.
From this the preliminary asm_xtensa docs resulted.

I was wondering what prevented the completion of the inline assembler for the ESP architecture. I guess it was the necessity to prioritize other work at the time. Otherwise at least the 8266 assembler would be completed.

I can't say whether supporting the windowed ABI is easy or not

The ESP32 viper emitter emits code according to this ABI. So from my limited understanding this problem seems to be solved already. It appears as if the completion of the inline assembler was stepped over.

Now as you completed the inline assembler for RISCV the Xtensa architecture is the only important one with missing inline assembler.
I will incorporate the inline assembler for RV-32 as optimization option for my CRC code soon.

I would love to help in development, testing and documentation on this. But didn't start to work on it myself, because I had the feeling that it is above my level of expertice.

@agatti
Copy link
Contributor
agatti commented Jan 22, 2025

The ESP32 viper emitter emits code according to this ABI.

Viper code doesn't need to handle function calls if I recall correctly, that's the native emitter. The native emitter does indeed have code for windowed function prologues/epilogues but the inline assembler doesn't use it, it has its own thing (same for the Thumb and RV32 inline assembler implementations). I'll need to read up on this - it doesn't look too hard but I haven't written much Xtensa assembler myself.

I guess it was the necessity to prioritize other work at the time. Otherwise at least the 8266 assembler would be completed. [...] It appears as if the completion of the inline assembler was stepped over.

Looks like it. I've taken a look, it shouldn't be too difficult to add the remaining opcodes. It won't look as nice and regular as the RV32 assembler (I'm biased :)), but the 8266 has tighter space requirements so compromises were made back then and will still need to be made going forward.

I will incorporate the inline assembler for RV-32 as optimization option for my CRC code soon.

If you do, you may want to also consider merging #16524 to your local source tree - it contains a couple of fixes that aren't super-urgent right now but can help nonetheless.

I would love to help in development, testing and documentation on this. But didn't start to work on it myself, because I had the feeling that it is above my level of expertice.

Adding opcodes is not that difficult - the complicated work has already been done (and it's usually figuring out opcode encodings and how to handle relative offsets for jumps and branches), but I understand if you prefer delegating this to others. Still, I'll ping you once I have something running so you can test the result if you feel like it - mostly to figure out which opcodes are present on the ESP32 and which ones aren't on the 8266. From what I understood Xtensa doesn't really have a minimum set of opcodes that must be present, unlike RISC-V profiles, and nothing stops you from synthesizing an Xtensa core having a partial implementation of a specific profile for example.

@agatti
Copy link
Contributor
agatti comm 8000 ented Jan 29, 2025

@rkompass I'm almost done with the core architecture opcodes, still need to figure out how the PC-relative load/store instructions need to be properly emitted (mixed 2 and 3 bytes instructions but with word-alignment constraints...), but they're implemented still. If you want to try that out, feel free to take a look at https://github.com/agatti/micropython/tree/xtensa-inline (no guarantees this will stay here or I won't force push stuff in there though).

@rkompass
Copy link

Congratulations @agatti !
Will try the new opcodes soon e.g. by changing my CRC code to include them.

@rkompass
Copy link

With a different search I today saw this PR 5082.
I assume you already viewed it because of the comments there. If not, here it is.

@agatti
Copy link
Contributor
agatti commented Feb 17, 2025

With a different search I today saw this PR 5082. I assume you already viewed it because of the comments there. If not, here it is.

Thanks for the link. I'll tackle ESP32-supported opcodes when/if the ESP8266 PR is merged (or if the maintainers prefer to also have ESP32 support to warrant a merge). I also do not own an ESP32-S3 board to test the LX7-only opcodes on, so that bit will have to wait.

Right now I explicitly tested things on a random ESP8266 board I repurposed from another project, and the official documentation from Cadence doesn't tell whether an opcode is LX6 or LX7 specific so I had to literally assemble the whole lot with GNU as and .byte statements and then discard the opcodes that triggered a crash on the board I had :)

8000

@rkompass
Copy link

Haha.:-)
In general I would like to but am afraid I cannot help you on this, because of limited expertise.
From viewing the PR I got the impression that you are thorough in implementing all available opcodes.
Comparing that to MP's asm_thumb: There only a minimal selection of opcodes is made available (it could really be more).
So the increase of the binary perhaps had to be weighed against availability of more opcodes. Perhaps it was only a restriction of the work to be invested then.

Looking again into the XTENSA ISA Reference Manual I'm impressed by the many many options for architectural additions.
I was looking into info how to read out the applied/activated options from the processor but did not find any.
Do you know something about that? Or is the difference between LX6 or LX7 exactly that: activated options?

Thanks for the link. I'll tackle ESP32-supported opcodes when/if the ESP8266 PR is merged (or if the maintainers prefer to also have ESP32 support to warrant a merge).

How difficult would it be to activate the '@micropython.asm_xtensawin' decorator option? Independent from the implementation of additional opcodes?

If you need help in writing documentation: I should be able to do some work here..

@agatti
Copy link
Contributor
agatti commented Feb 17, 2025

So the increase of the binary perhaps had to be weighed against availability of more opcodes. Perhaps it was only a restriction of the work to be invested then.

I can understand not adding opcodes for address generation (addx2/subx2/...), leaving out some arithmetic opcodes, or not having the full set of branches, as you can easily write equivalent assembler code without them and minimise the MicroPython footprint taken by the feature. Not having shifts, on the other hand...

Also, I've had to skip a few opcodes due to limitations of the native emitter infrastructure - forward-only jump opcodes aren't implemented.

Looking again into the XTENSA ISA Reference Manual I'm impressed by the many many options for architectural additions. I was looking into info how to read out the applied/activated options from the processor but did not find any. Do you know something about that? Or is the difference between LX6 or LX7 exactly that: activated options?

Then don't look at RISC-V, you may get dizzy with how many options/extensions you'll find :). Xtensa processors are meant to be heavily customised at a very deep level depending on specific use cases, so it doesn't make much sense to spend logic gates on option availability/feature presence detection - you already know what you're running on.

As far as I know, LX7 is a superset of LX6 (Tensilica Cadence guarantees that LX6 code will work unmodified on LX7), so not only there are new options available in LX7 but also existing options have additional opcodes. Having had an Xtensa QEMU target would have helped testing, but I wonder whether it makes sense to add that now.

How difficult would it be to activate the '@micropython.asm_xtensawin' decorator option? Independent from the implementation of additional opcodes?

Not at all, but I don't think that is really useful to split decorators between "xtensa" and "xtensawin". I'd say it makes more sense to have a single "xtensa" decorator and then check at runtime what you're running on/targeting (see tests/feature_check/target_info.py), but ultimately that's up to the maintainers to have the last word on this :)

The rationale for waiting on maintainers for me to move forward is that by not bringing in a full LX6 + LX7 inline assembler implementation with tests, documentation changes, esp32 port changes, and changes to test scripts in a single monolithic chunk, it would take less time for the maintainers to review the code and suggest changes to be made, thus establishing a baseline on how the Xtensa code should look like. Further changes would follow the guidelines implemented in the previous PR and take less review time in the long run. If the maintainers say they're OK with a single ESP8266+ESP32 PR then I'll update the whole lot, though.

If you need help in writing documentation: I should be able to do some work here..

If that PR is going to be merged, then I doubt the maintainers would reject additional documentation. Extra tests for new opcodes once they're added wouldn't hurt either, but let's see what the maintainers think first. Thanks for the offer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests, new feature implementations port-esp32
Projects
None yet
Development

No branches or pull requests

4 participants
0