Towards persistent bytecode #222

pfalcon · 2014-01-24T23:22:48Z

So, I had a surface look at uPy bytecode objects and compared that to CPython's .pyc (based on http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html).

As expected, biggest difference is that .pyc includes pool of constants used by given module, and bytecode args refer to local constant pool, whereas bytecode generated by uPy refers to global VM pool (at least for qstr's). And as uPy using varlen encoding for qstr id's, that makes bytecode not suitable for persistence and reloading. For example, a string which had qstr id of 10, in another environment may have it as 10000, and not fit into varlen encoding for 10.

As being able to compile to standalone bytecode and load it on its own is apparently an important feature, them IMHO varlen encoding in bytecode arguments should be given up, and full mp_obj_t value should be used instead (in persistent bytecode that would be id in local constant pool, on loading a "linker" would patch it to be a global VM id or direct mp_obj_t value).

dhylands · 2014-01-24T23:31:44Z

I think it would be good if it could be done without relinking (perhaps at a slight performance hit).

Maybe we need a double indirection (so that the python code indexes into a local table (which is in RAM), and that table is what gets "linked"). The ideal would be that the bytecodes can stay in flash and not need to be loaded into RAM at all.

pfalcon · 2014-01-24T23:35:26Z

One interesting note here is that GC won't be able to detect object references in bytecode, because it expects pointers to be word-aligned, while bytecode is byte-aligned. This shouldn't affect uPy currently, but may complicate some advanced optimizations (for example, CPy appears to support complex constant expressions than just strings, for example, tuple of constant items. An implementation of that which would construct such object on heap and then patch its address into bytecode would be erroneous).

pfalcon · 2014-01-24T23:40:36Z

Maybe we need a double indirection

Double indirection is huge waste of memory ;-). Consider for example that varlen encoding was used to conserve memory, then augmenting each varlen "local id" with a table to map it to global id will take more memory than using global id directly (unless we aim for poor man's huffman encoding, hoping to get few local id's used many times again and again).

The ideal would be that the bytecodes can stay in flash and not need to be loaded into RAM at all.

Well, bytecode in flash is static, so it can be prelinked once, voila. The talk is how to support dynamic loading of bytecode modules.

dhylands · 2014-01-25T00:11:40Z

Well, bytecode in flash is static, so it can be prelinked once, voila. The talk is how to support dynamic loading of bytecode modules.

I guess I'm missing something then.

Wouldn't that mean that you'd need to load all of the modules in exactly the same order? So that the indicies all get assigned the same each time?

pfalcon · 2014-01-25T00:21:11Z

Do you mean static or dynamic case? With static, I don't see any issues (except for toolset which make it work as expected).

For dynamic, that's exactly a problem. And the way to resolve it is that persistent bytecode (a-la .pyc) uses local module constant id's, as well as includes constant pool. Then, during loading (into RAM of course), "linker" will push constants to global pool, and patch bytecode to replace local id's with global id's (or pointers right away).

e.g.:

persistent:

LOAD_STR 2
...
CONST_POOL:
0: ...
1: ...
2: "foo"

loaded:

LOAD_STR 0x10001234
...
qstr pool:
...
567: 0x10001234

mem:
0x10001234: qstr("foo")

Note that for qstr, we have global inter table, so they're not affected by GC issue mentioned above, so we can put pointers straight into bytecode, saving double indirection lookup.

dpgeorge · 2014-01-25T00:59:06Z

Note that the linker could re encode the byte code, encoding variable length qstrs ids with the correct value for the current runtime.

Var-len in bytecode arguments is an important feature to keep the byte code size down.

Complex constants (eg a tuple of strings) are built dynamically in uPy.

pfalcon · 2014-03-28T19:04:07Z

Up discussion due to #386. And here's something I wanted to respond for a while:

Note that the linker could re encode the byte code, encoding variable length [...] ids

Byte re-encoding is complicated! I'm not sure it's worth it, if there can be other ways to deal with it. For example, why not use global static varying length set at compile-time. 2 bytes should be well enough for MCU, 3 bytes will be well enough for any realistic uPy app, but 4 bytes are still possible.

pfalcon · 2014-04-03T11:25:58Z

Another observation: it could be that we have enough static qstr's to guarantee that any user-defined would be at least 2 bytes, and per previous comment, for MCU usage, 2 byte is just enough.

Unfortunately, we already have more than 128 static qstrs, and soon will have more than 256. I don't know if it's practical to suggest introducing separate bytecodes for loading builtin vs user qstrs, but if it is, we actually can have multiple bytecodes for builtins of different ranges, so can resolve this situation.

dpgeorge · 2014-04-03T11:36:47Z

The main thing with qstrs in bytecode are the LOAD_{NAME,GLOBAL,ATTR,METHOD}, and to a lesser extent the STORE_{xx} variants. I would guess that LOAD_METHOD is the most common (but just a guess). So all these would need builtin vs user variants.

errordeveloper · 2014-05-07T07:29:17Z

Would using ELF format solve some of these problems? I'd then recommend Contiki's ELF loader. It would also allow one to load a native module... Or we are hoping to have something more of an ABI?

pfalcon · 2014-05-07T17:32:43Z

@errordeveloper , I don't think that using ELF would help anyhow with bytecode - a custom lean format will be much easier to implement and support. But your link may be helpful for #583, if someone will ever get to it.

TWHanson · 2015-06-09T16:17:28Z

Anything new on this issue? I'm working a project that would like to use MP but would also like to (need to?) use persistent bytecode.

Follow up: If the execution environment is sufficiently constrained, will persistent bytecode work? For example we'll be running in an embedded environment with no underlying file system. The script(s) that are run will be identical in content and order every time they're run. Additional constraints could possibly be supported.
Thx

pfalcon · 2015-06-09T17:20:51Z

@TWHanson: Perfect, we look for people who actually need this feature to implement it!

hoihu · 2015-06-16T18:57:59Z

+1 for a "ROM bytecode" support for smaller micros...

and maybe even executable via REPL, for tests?

Something like
blob = fh.read('test.pyc','rb')
exec_raw(blob)

dpgeorge · 2015-06-16T21:21:53Z

For example we'll be running in an embedded environment with no underlying file system.

uPy doesn't need a filesystem to function. You can still import modules so long as they are builtin or frozen.

dpgeorge · 2015-10-23T21:34:32Z

Ok, I have persistent bytecode working! I will post code later, with a proper description, but for now I'll just explain the basics of how it works (note it is still WIP).

Example input file (persist.py):

x = 1
print(x)
print(x * 10)

def foo(z):
    print("persist!", z)

foo(123)

Then using unix port, this is compiled to (persist.mpc):

00000000: 4d50 4330 3031 3203 0000 0000 000a 2427  MPC0012.......$'
00000010: 4964 0000 0000 00ff 8124 021c 031c 0464  Id.......$.....d
00000020: 0132 1c05 1c06 8add 6401 3260 0924 071c  .2......d.2`.$..
00000030: 0814 807b 6401 3211 5b09 0001 083c 6d6f  ...{d.2.[....<mo
00000040: 6475 6c65 3e0a 7065 7273 6973 742e 7079  dule>.persist.py
00000050: 0178 0570 7269 6e74 0178 0570 7269 6e74  .x.print.x.print
00000060: 0178 0366 6f6f 0366 6f6f 1b04 0000 0100  .x.foo.foo......
00000070: 000a 6140 0000 0000 0000 00ff 1d03 1604  ..a@............
00000080: b064 0232 115b 0500 0003 666f 6f0a 7065  .d.2.[....foo.pe
00000090: 7273 6973 742e 7079 017a 0570 7269 6e74  rsist.py.z.print
000000a0: 0870 6572 7369 7374 21                   .persist!

Then you can import this same compiled code in unix (import persist) and also stmhal, and it works correctly.

The .mpc file contains a header, then bytecode, then a constant table, then nested code objects. In more detail:

- 6 byte header, MPC001
- bytecode length (encoded as var-uint)
- bytecode data
- number of qstrs to "link" (var uint)
- number of constant objects (var uint)
- number of nested code chunks (var uint)
- literal data for all qstrs (encoded as var-uint length followed by actual str data)
- literal data for all constants (eg bignum, float)
- nested code chunks (starting with bytecode length)

The main thing is that qstrs have an indirection into the constant table. So the bytecode itself is completely static/const and does not need to be changed when loading. Thus it can be executed directly from flash.

When a .mpc file is loaded, the bytecode is loaded verbatim, then the constant table is generated. The VM now needs a pointer to the bytecode and the constant table, in order to execute the code.

I think this is a good starting point for a discussion of exactly what we want from persistent bytecode, and how to modify the above scheme to make it something useful for our needs. So fire away with comments/criticism/etc!

danicampora · 2015-10-23T21:42:24Z

Amazing!! :-)

On Oct 23, 2015, at 11:34 PM, Damien George notifications@github.com wrote:

Ok, I have persistent bytecode working! I will post code later, with a proper description, but for now I'll just explain the basics of how it works (note it is still WIP).

Example input file (persist.py):

x = 1
print(x)
print(x * 10)

def foo(z):
print("persist!", z)

foo(123)
Then using unix port, this is compiled to (persist.mpc):

00000000: 4d50 4330 3031 3203 0000 0000 000a 2427 MPC0012.......$'
00000010: 4964 0000 0000 00ff 8124 021c 031c 0464 Id.......$.....d
00000020: 0132 1c05 1c06 8add 6401 3260 0924 071c .2......d.2`.$..
00000030: 0814 807b 6401 3211 5b09 0001 083c 6d6f ...{d.2.[....<mo
00000040: 6475 6c65 3e0a 7065 7273 6973 742e 7079 dule>.persist.py
00000050: 0178 0570 7269 6e74 0178 0570 7269 6e74 .x.print.x.print
00000060: 0178 0366 6f6f 0366 6f6f 1b04 0000 0100 .x.foo.foo......
00000070: 000a 6140 0000 0000 0000 00ff 1d03 1604 ..a@............
00000080: b064 0232 115b 0500 0003 666f 6f0a 7065 .d.2.[....foo.pe
00000090: 7273 6973 742e 7079 017a 0570 7269 6e74 rsist.py.z.print
000000a0: 0870 6572 7369 7374 21 .persist!
Then you can import this same compiled code in unix (import persist) and also stmhal, and it works correctly.

The .mpc file contains a header, then bytecode, then a constant table, then nested code objects. In more detail:

6 byte header, MPC001

bytecode length (encoded as var-uint)

bytecode data

number of qstrs to "link" (var uint)

number of constant objects (var uint)

number of nested code chunks (var uint)

literal data for all qstrs (encoded as var-uint length followed by actual str data)

literal data for all constants (eg bignum, float)

nested code chunks (starting with bytecode length)
The main thing is that qstrs have an indirection into the constant table. So the bytecode itself is completely static/const and does not need to be changed when loading. Thus it can be executed directly from flash.

When a .mpc file is loaded, the bytecode is loaded verbatim, then the constant table is generated. The VM now needs a pointer to the bytecode and the constant table, in order to execute the code.

I think this is a good starting point for a discussion of exactly what we want from persistent bytecode, and how to modify the above scheme to make it something useful for our needs. So fire away with comments/criticism/etc!

—
Reply to this email directly or view it on GitHub.

pfalcon · 2015-10-23T22:14:14Z

We'll need to release next version as 1.6 ;-).

Some quick comments:

6 byte header

That's huge, can we reduce this to 2-3 bits? ;-)

The main thing is that qstrs have an indirection into the constant table.

Do you actually mean that all constant objects are referenced indirectly via constant table? That's the only way I could quickly interpret it. When I thought how to implement it, first idea in list was to give up varlen encoding for qstrs, to be able to patch in references to them on "linking" (i.e. persistent bytecode has e.g. local qstr ids (or perhaps even offsets into table), on loading each qstr is added to global table, and its global id is patched back). Of course, having r/o bytecode is more beneficial (well, depends on environment of course), but it's unclear how to deal with global qstr table.

Also, to clarify, "code chunk" starts with "- bytecode length (encoded as var-uint)" and ends with "- literal data for all constants (eg bignum, float)", optionally followed by recursive code chunks of the same structure, with the top-level chunk being module-level code? It's also not clear how code chunks are linked together - we used to encode direct pointer in bytecode.

Well, I guess code will clarify some of these questions.

dpgeorge · 2015-10-23T22:55:25Z

All "external pointing entities" (qstr, const, nested code blocks) are now encoded in the bytecode as an index (usually 1 byte) into the local constant table. This table is created when loading. Pointers to nested code blocks are stored in this table and are made when unpacking the nested persistent bytecode. Will post code tomorrow! It's a bit messy at the moment.

dhylands · 2015-10-23T23:43:10Z

Sweet.

I can see some add-on utilities that would be useful.

If you don't have it already, having some type of version information so that a newer version of the firmware can tell if its capable of executing some older bytecodes would be useful.

So something that parallels frozen modules but for precompiled code.

Having a mechanism to "chunk" up the bytes code so that they fit conveniently into 512-byte file system blocks might be something worth exploring. That way the persistent bytecode could be stored as a file, and even though the file isn't contiguous it could still be executed directly from flash.

Storing and manipulating the bytecode blocks outside of the filesystem probably needs some type of index. Due to the size of the flash pages, we'd probably need some host side support for managing persistent bytecode in say upper flash. The host tool could transfer the existing code from flash, and allow portions to be added/updated and then written back to flash on the board. If an sdcard was available, then it could be used for temporary storage.

hoihu · 2015-10-24T23:17:56Z

Great news!

In other words - "VM runs bytecode from flash", is that right?

I'm looking forward to less RAM requirements :) Great news for low-power, low cost platforms.
Also - because the compile stage is omitted there should be a significant decrease in FLASH as well I presume?

So what about the different code emitters (native,viper)? They are bound to a specific VM I believe - would they still be supported? Can bytecode segments be tagged so the VM nows how to interpret it?

Thinking of it - Viper, as I understand, emits thumb instructions so the precomiled output would actually be a normal hex file and wouldn't require a VM on the micro ;)

Looking forward to some more details!

dpgeorge · 2015-11-02T22:52:07Z

Version 2 of persistent bytecode posted at #1577.

If you don't have it already, having some type of version information so that a newer version of the firmware can tell if its capable of executing some older bytecodes would be useful.

Yes there is a header with version info.

Having a mechanism to "chunk" up the bytes code so that they fit conveniently into 512-byte file system blocks might be something worth exploring. That way the persistent bytecode could be stored as a file, and even though the file isn't contiguous it could still be executed directly from flash.

That's really difficult. You'd need to heavily modify the VM to take into account the fact that the next byte of the bytecode may be in another 512-byte-block (eg decoding a variable-length integer, the first byte might be in one block, the second byte in the next block). I don't think it's worth it.

Instead it's better/easier to just freeze the precompiled bytecode into the firmware.

In other words - "VM runs bytecode from flash", is that right?

Yes, that can now be done using frozen bytecode.

Also - because the compile stage is omitted there should be a significant decrease in FLASH as well I presume?

Yes, about 20k less for Thumb2 archs. But you don't get a REPL anymore :(

So what about the different code emitters (native,viper)?

Not yet supported, but would be possible to make persistent native code blocks.

dpgeorge · 2016-05-06T15:42:31Z

Persistent frozen bytecode is now merged in master.

…hon#222) The frozen module `_boot.py` was not being loaded on restart because `pyexec_frozen_module()` did not know about the new `.frozen` pseudo-directory. Updated lower-level routine to look in the right place. Also made ".frozen" and related values be `#define`s.

dpgeorge mentioned this issue Oct 12, 2015

py/compile: Remove "return if" optimisation, it's rarely used. #1505

Closed

dpgeorge closed this as completed May 6, 2016

aykevl mentioned this issue Sep 12, 2017

Native modules in ELF format #3311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Towards persistent bytecode #222

Towards persistent bytecode #222

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Towards persistent bytecode #222

Towards persistent bytecode #222

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!