-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Towards persistent bytecode #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think it would be good if it could be done without relinking (perhaps at a slight performance hit). Maybe we need a double indirection (so that the python code indexes into a local table (which is in RAM), and that table is what gets "linked"). The ideal would be that the bytecodes can stay in flash and not need to be loaded into RAM at all. |
One interesting note here is that GC won't be able to detect object references in bytecode, because it expects pointers to be word-aligned, while bytecode is byte-aligned. This shouldn't affect uPy currently, but may complicate some advanced optimizations (for example, CPy appears to support complex constant expressions than just strings, for example, tuple of constant items. An implementation of that which would construct such object on heap and then patch its address into bytecode would be erroneous). |
Double indirection is huge waste of memory ;-). Consider for example that varlen encoding was used to conserve memory, then augmenting each varlen "local id" with a table to map it to global id will take more memory than using global id directly (unless we aim for poor man's huffman encoding, hoping to get few local id's used many times again and again).
Well, bytecode in flash is static, so it can be prelinked once, voila. The talk is how to support dynamic loading of bytecode modules. |
I guess I'm missing something then. Wouldn't that mean that you'd need to load all of the modules in exactly the same order? So that the indicies all get assigned the same each time? |
Do you mean static or dynamic case? With static, I don't see any issues (except for toolset which make it work as expected). For dynamic, that's exactly a problem. And the way to resolve it is that persistent bytecode (a-la .pyc) uses local module constant id's, as well as includes constant pool. Then, during loading (into RAM of course), "linker" will push constants to global pool, and patch bytecode to replace local id's with global id's (or pointers right away). e.g.: persistent:
loaded:
Note that for qstr, we have global inter table, so they're not affected by GC issue mentioned above, so we can put pointers straight into bytecode, saving double indirection lookup. |
Note that the linker could re encode the byte code, encoding variable length qstrs ids with the correct value for the current runtime. Var-len in bytecode arguments is an important feature to keep the byte code size down. Complex constants (eg a tuple of strings) are built dynamically in uPy. |
Up discussion due to #386. And here's something I wanted to respond for a while:
Byte re-encoding is complicated! I'm not sure it's worth it, if there can be other ways to deal with it. For example, why not use global static varying length set at compile-time. 2 bytes should be well enough for MCU, 3 bytes will be well enough for any realistic uPy app, but 4 bytes are still possible. |
Another observation: it could be that we have enough static qstr's to guarantee that any user-defined would be at least 2 bytes, and per previous comment, for MCU usage, 2 byte is just enough. Unfortunately, we already have more than 128 static qstrs, and soon will have more than 256. I don't know if it's practical to suggest introducing separate bytecodes for loading builtin vs user qstrs, but if it is, we actually can have multiple bytecodes for builtins of different ranges, so can resolve this situation. |
The main thing with qstrs in bytecode are the LOAD_{NAME,GLOBAL,ATTR,METHOD}, and to a lesser extent the STORE_{xx} variants. I would guess that LOAD_METHOD is the most common (but just a guess). So all these would need builtin vs user variants. |
Would using ELF format solve some of these problems? I'd then recommend Contiki's ELF loader. It would also allow one to load a native module... Or we are hoping to have something more of an ABI? |
@errordeveloper , I don't think that using ELF would help anyhow with bytecode - a custom lean format will be much easier to implement and support. But your link may be helpful for #583, if someone will ever get to it. |
Anything new on this issue? I'm working a project that would like to use MP but would also like to (need to?) use persistent bytecode. Follow up: If the execution environment is sufficiently constrained, will persistent bytecode work? For example we'll be running in an embedded environment with no underlying file system. The script(s) that are run will be identical in content and order every time they're run. Additional constraints could possibly be supported. |
@TWHanson: Perfect, we look for people who actually need this feature to implement it! |
+1 for a "ROM bytecode" support for smaller micros... and maybe even executable via REPL, for tests? Something like |
uPy doesn't need a filesystem to function. You can still import modules so long as they are builtin or frozen. |
Ok, I have persistent bytecode working! I will post code later, with a proper description, but for now I'll just explain the basics of how it works (note it is still WIP). Example input file (persist.py): x = 1
print(x)
print(x * 10)
def foo(z):
print("persist!", z)
foo(123) Then using unix port, this is compiled to (persist.mpc):
Then you can import this same compiled code in unix (import persist) and also stmhal, and it works correctly. The .mpc file contains a header, then bytecode, then a constant table, then nested code objects. In more detail:
The main thing is that qstrs have an indirection into the constant table. So the bytecode itself is completely static/const and does not need to be changed when loading. Thus it can be executed directly from flash. When a .mpc file is loaded, the bytecode is loaded verbatim, then the constant table is generated. The VM now needs a pointer to the bytecode and the constant table, in order to execute the code. I think this is a good starting point for a discussion of exactly what we want from persistent bytecode, and how to modify the above scheme to make it something useful for our needs. So fire away with comments/criticism/etc! |
Amazing!! :-)
|
We'll need to release next version as 1.6 ;-). Some quick comments:
That's huge, can we reduce this to 2-3 bits? ;-)
Do you actually mean that all constant objects are referenced indirectly via constant table? That's the only way I could quickly interpret it. When I thought how to implement it, first idea in list was to give up varlen encoding for qstrs, to be able to patch in references to them on "linking" (i.e. persistent bytecode has e.g. local qstr ids (or perhaps even offsets into table), on loading each qstr is added to global table, and its global id is patched back). Of course, having r/o bytecode is more beneficial (well, depends on environment of course), but it's unclear how to deal with global qstr table. Also, to clarify, "code chunk" starts with "- bytecode length (encoded as var-uint)" and ends with "- literal data for all constants (eg bignum, float)", optionally followed by recursive code chunks of the same structure, with the top-level chunk being module-level code? It's also not clear how code chunks are linked together - we used to encode direct pointer in bytecode. Well, I guess code will clarify some of these questions. |
All "external pointing entities" (qstr, const, nested code blocks) are now
encoded in the bytecode as an index (usually 1 byte) into the local
constant table. This table is created when loading. Pointers to nested code
blocks are stored in this table and are made when unpacking the nested
persistent bytecode.
Will post code tomorrow! It's a bit messy at the moment.
|
Sweet. I can see some add-on utilities that would be useful. If you don't have it already, having some type of version information so that a newer version of the firmware can tell if its capable of executing some older bytecodes would be useful. So something that parallels frozen modules but for precompiled code. Having a mechanism to "chunk" up the bytes code so that they fit conveniently into 512-byte file system blocks might be something worth exploring. That way the persistent bytecode could be stored as a file, and even though the file isn't contiguous it could still be executed directly from flash. Storing and manipulating the bytecode blocks outside of the filesystem probably needs some type of index. Due to the size of the flash pages, we'd probably need some host side support for managing persistent bytecode in say upper flash. The host tool could transfer the existing code from flash, and allow portions to be added/updated and then written back to flash on the board. If an sdcard was available, then it could be used for temporary storage. |
Great news! In other words - "VM runs bytecode from flash", is that right? I'm looking forward to less RAM requirements :) Great news for low-power, low cost platforms. So what about the different code emitters (native,viper)? They are bound to a specific VM I believe - would they still be supported? Can bytecode segments be tagged so the VM nows how to interpret it? Thinking of it - Viper, as I understand, emits thumb instructions so the precomiled output would actually be a normal hex file and wouldn't require a VM on the micro ;) Looking forward to some more details! |
Version 2 of persistent bytecode posted at #1577.
Yes there is a header with version info.
That's really difficult. You'd need to heavily modify the VM to take into account the fact that the next byte of the bytecode may be in another 512-byte-block (eg decoding a variable-length integer, the first byte might be in one block, the second byte in the next block). I don't think it's worth it. Instead it's better/easier to just freeze the precompiled bytecode into the firmware.
Yes, that can now be done using frozen bytecode.
Yes, about 20k less for Thumb2 archs. But you don't get a REPL anymore :(
Not yet supported, but would be possible to make persistent native code blocks. |
Persistent frozen bytecode is now merged in master. |
…hon#222) The frozen module `_boot.py` was not being loaded on restart because `pyexec_frozen_module()` did not know about the new `.frozen` pseudo-directory. Updated lower-level routine to look in the right place. Also made ".frozen" and related values be `#define`s.
So, I had a surface look at uPy bytecode objects and compared that to CPython's .pyc (based on http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html).
As expected, biggest difference is that .pyc includes pool of constants used by given module, and bytecode args refer to local constant pool, whereas bytecode generated by uPy refers to global VM pool (at least for qstr's). And as uPy using varlen encoding for qstr id's, that makes bytecode not suitable for persistence and reloading. For example, a string which had qstr id of 10, in another environment may have it as 10000, and not fit into varlen encoding for 10.
As being able to compile to standalone bytecode and load it on its own is apparently an important feature, them IMHO varlen encoding in bytecode arguments should be given up, and full mp_obj_t value should be used instead (in persistent bytecode that would be id in local constant pool, on loading a "linker" would patch it to be a global VM id or direct mp_obj_t value).
The text was updated successfully, but these errors were encountered: