8000 Towards persistent bytecode · Issue #222 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

Towards persistent bytecode #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pfalcon opened this issue Jan 24, 2014 · 23 comments
Closed

Towards persistent bytecode #222

pfalcon opened this issue Jan 24, 2014 · 23 comments
Labels
rfc Request for Comment

Comments

@pfalcon
Copy link
Contributor
pfalcon commented Jan 24, 2014

So, I had a surface look at uPy bytecode objects and compared that to CPython's .pyc (based on http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html).

As expected, biggest difference is that .pyc includes pool of constants used by given module, and bytecode args refer to local constant pool, whereas bytecode generated by uPy refers to global VM pool (at least for qstr's). And as uPy using varlen encoding for qstr id's, that makes bytecode not suitable for persistence and reloading. For example, a string which had qstr id of 10, in another environment may have it as 10000, and not fit into varlen encoding for 10.

As being able to compile to standalone bytecode and load it on its own is apparently an important feature, them IMHO varlen encoding in bytecode arguments should be given up, and full mp_obj_t value should be used instead (in persistent bytecode that would be id in local constant pool, on loading a "linker" would patch it to be a global VM id or direct mp_obj_t value).

@dhylands
Copy link
Contributor

I think it would be good if it could be done without relinking (perhaps at a slight performance hit).

Maybe we need a double indirection (so that the python code indexes into a local table (which is in RAM), and that table is what gets "linked"). The ideal would be that the bytecodes can stay in flash and not need to be loaded into RAM at all.

@pfalcon
Copy link
Contributor Author
8000 pfalcon commented Jan 24, 2014

One interesting note here is that GC won't be able to detect object references in bytecode, because it expects pointers to be word-aligned, while bytecode is byte-aligned. This shouldn't affect uPy currently, but may complicate some advanced optimizations (for example, CPy appears to support complex constant expressions than just strings, for example, tuple of constant items. An implementation of that which would construct such object on heap and then patch its address into bytecode would be erroneous).

@pfalcon
Copy link
Contributor Author
pfalcon commented Jan 24, 2014

Maybe we need a double indirection

Double indirection is huge waste of memory ;-). Consider for example that varlen encoding was used to conserve memory, then augmenting each varlen "local id" with a table to map it to global id will take more memory than using global id directly (unless we aim for poor man's huffman encoding, hoping to get few local id's used many times again and again).

The ideal would be that the bytecodes can stay in flash and not need to be loaded into RAM at all.

Well, bytecode in flash is static, so it can be prelinked once, voila. The talk is how to support dynamic loading of bytecode modules.

@dhylands
Copy link
Contributor

Well, bytecode in flash is static, so it can be prelinked once, voila. The talk is how to support dynamic loading of bytecode modules.

I guess I'm missing something then.

Wouldn't that mean that you'd need to load all of the modules in exactly the same order? So that the indicies all get assigned the same each time?

@pfalcon
Copy link
Contributor Author
pfalcon commented Jan 25, 2014

Do you mean static or dynamic case? With static, I don't see any issues (except for toolset which make it work as expected).

For dynamic, that's exactly a problem. And the way to resolve it is that persistent bytecode (a-la .pyc) uses local module constant id's, as well as includes constant pool. Then, during loading (into RAM of course), "linker" will push constants to global pool, and patch bytecode to replace local id's with global id's (or pointers right away).

e.g.:

persistent:

LOAD_STR 2
...
CONST_POOL:
0: ...
1: ...
2: "foo"

loaded:

LOAD_STR 0x10001234
...
qstr pool:
...
567: 0x10001234

mem:
0x10001234: qstr("foo")

Note that for qstr, we have global inter table, so they're not affected by GC issue mentioned above, so we can put pointers straight into bytecode, saving double indirection lookup.

@dpgeorge
Copy link
Member

Note that the linker could re encode the byte code, encoding variable length qstrs ids with the correct value for the current runtime.

Var-len in bytecode arguments is an important feature to keep the byte code size down.

Complex constants (eg a tuple of strings) are built dynamically in uPy.

@pfalcon
Copy link
Contributor Author
pfalcon commented Mar 28, 2014

Up discussion due to #386. And here's something I wanted to respond for a while:

Note that the linker could re encode the byte code, encoding variable length [...] ids

Byte re-encoding is complicated! I'm not sure it's worth it, if there can be other ways to deal with it. For example, why not use global static varying length set at compile-time. 2 bytes should be well enough for MCU, 3 bytes will be well enough for any realistic uPy app, but 4 bytes are still possible.

@pfalcon
Copy link
Contributor Author
pfalcon commented Apr 3, 2014

Another observation: it could be that we have enough static qstr's to guarantee that any user-defined would be at least 2 bytes, and per previous comment, for MCU usage, 2 byte is just enough.

Unfortunately, we already have more than 128 static qstrs, and soon will have more than 256. I don't know if it's practical to suggest introducing separate bytecodes for loading builtin vs user qstrs, but if it is, we actually can have multiple bytecodes for builtins of different ranges, so can resolve this situation.

@dpgeorge
Copy link
Member
dpgeorge commented Apr 3, 2014

The main thing with qstrs in bytecode are the LOAD_{NAME,GLOBAL,ATTR,METHOD}, and to a lesser extent the STORE_{xx} variants. I would guess that LOAD_METHOD is the most common (but just a guess). So all these would need builtin vs user variants.

@errordeveloper
Copy link
Contributor

Would using ELF format solve some of these problems? I'd then recommend Contiki's ELF loader. It would also allow one to load a native module... Or we are hoping to have something more of an ABI?

@pfalcon
Copy link
Contributor Author
pfalcon commented May 7, 2014

@errordeveloper , I don't think that using ELF would help anyhow with bytecode - a custom lean format will be much easier to implement and support. But your link may be helpful for #583, if someone will ever get to it.

@TWHanson
Copy link
TWHanson commented Jun 9, 2015

Anything new on this issue? I'm working a project that would like to use MP but would also like to (need to?) use persistent bytecode.

Follow up: If the execution environment is sufficiently constrained, will persistent bytecode work? For example we'll be running in an embedded environment with no underlying file system. The script(s) that are run will be identical in content and order every time they're run. Additional constraints could possibly be supported.
Thx

@pfalcon
Copy link
Contributor Author
pfalcon commented Jun 9, 2015

@TWHanson: Perfect, we look for people who actually need this feature to implement it!

@hoihu
Copy link
Contributor
hoihu commented Jun 16, 2015

+1 for a "ROM bytecode" support for smaller micros...

and maybe even executable via REPL, for tests?

Something like
blob = fh.read('test.pyc','rb')
exec_raw(blob)

@dpgeorge
Copy link
Member

For example we'll be running in an embedded environment with no underlying file system.

uPy doesn't need a filesystem to function. You can still import modules so long as they are builtin or frozen.

@dpgeorge
Copy link
Member

Ok, I have persistent bytecode working! I will post code later, with a proper description, but for now I'll just explain the basics of how it works (note it is still WIP).

Example input file (persist.py):

x = 1
print(x)
print(x * 10)

def foo(z):
    print("persist!", z)

foo(123)

Then using unix port, this is compiled to (persist.mpc):

00000000: 4d50 4330 3031 3203 0000 0000 000a 2427  MPC0012.......$'
00000010: 4964 0000 0000 00ff 8124 021c 031c 0464  Id.......$.....d
00000020: 0132 1c05 1c06 8add 6401 3260 0924 071c  .2......d.2`.$..
00000030: 0814 807b 6401 3211 5b09 0001 083c 6d6f  ...{d.2.[....<mo
00000040: 6475 6c65 3e0a 7065 7273 6973 742e 7079  dule>.persist.py
00000050: 0178 0570 7269 6e74 0178 0570 7269 6e74  .x.print.x.print
00000060: 0178 0366 6f6f 0366 6f6f 1b04 0000 0100  .x.foo.foo......
00000070: 000a 6140 0000 0000 0000 00ff 1d03 1604  ..a@............
00000080: b064 0232 115b 0500 0003 666f 6f0a 7065  .d.2.[....foo.pe
00000090: 7273 6973 742e 7079 017a 0570 7269 6e74  rsist.py.z.print
000000a0: 0870 6572 7369 7374 21                   .persist!

Then you can import this same compiled code in unix (import persist) and also stmhal, and it works correctly.

The .mpc file contains a header, then bytecode, then a constant table, then nested code objects. In more detail:

- 6 byte header, MPC001
- bytecode length (encoded as var-uint)
- bytecode data
- number of qstrs to "link" (var uint)
- number of constant objects (var uint)
- number of nested code chunks (var uint)
- literal data for all qstrs (encoded as var-uint length followed by actual str data)
- literal data for all constants (eg bignum, float)
- nested code chunks (starting with bytecode length)

The main thing is that qstrs have an indirection into the constant table. So the bytecode itself is completely static/const and does not need to be changed when loading. Thus it can be executed directly from flash.

When a .mpc file is loaded, the bytecode is loaded verbatim, then the constant table is generated. The VM now needs a pointer to the bytecode and the constant table, in order to execute the code.

I think this is a good starting point for a discussion of exactly what we want from persistent bytecode, and how to modify the above scheme to make it something useful for our needs. So fire away with comments/criticism/etc!

@danicampora
Copy link
Member

Amazing!! :-)

On Oct 23, 2015, at 11:34 PM, Damien George notifications@github.com wrote:

Ok, I have persistent bytecode working! I will post code later, with a proper description, but for now I'll just explain the basics of how it works (note it is still WIP).

Example input file (persist.py):

x = 1
print(x)
print(x * 10)

def foo(z):
print("persist!", z)

foo(123)
Then using unix port, this is compiled to (persist.mpc):

00000000: 4d50 4330 3031 3203 0000 0000 000a 2427 MPC0012.......$'
00000010: 4964 0000 0000 00ff 8124 021c 031c 0464 Id.......$.....d
00000020: 0132 1c05 1c06 8add 6401 3260 0924 071c .2......d.2`.$..
00000030: 0814 807b 6401 3211 5b09 0001 083c 6d6f ...{d.2.[....<mo
00000040: 6475 6c65 3e0a 7065 7273 6973 742e 7079 dule>.persist.py
00000050: 0178 0570 7269 6e74 0178 0570 7269 6e74 .x.print.x.print
00000060: 0178 0366 6f6f 0366 6f6f 1b04 0000 0100 .x.foo.foo......
00000070: 000a 6140 0000 0000 0000 00ff 1d03 1604 ..a@............
00000080: b064 0232 115b 0500 0003 666f 6f0a 7065 .d.2.[....foo.pe
00000090: 7273 6973 742e 7079 017a 0570 7269 6e74 rsist.py.z.print
000000a0: 0870 6572 7369 7374 21 .persist!
Then you can import this same compiled code in unix (import persist) and also stmhal, and it works correctly.

The .mpc file contains a header, then bytecode, then a constant table, then nested code objects. In more detail:

  • 6 byte header, MPC001
  • bytecode length (encoded as var-uint)
  • bytecode data
  • number of qstrs to "link" (var uint)
  • number of constant objects (var uint)
  • number of nested code chunks (var uint)
  • literal data for all qstrs (encoded as var-uint length followed by actual str data)
  • literal data for all constants (eg bignum, float)
  • nested code chunks (starting with bytecode length)
    The main thing is that qstrs have an indirection into the constant table. So the bytecode itself is completely static/const and does not need to be changed when loading. Thus it can be executed directly from flash.

When a .mpc file is loaded, the bytecode is loaded verbatim, then the constant table is generated. The VM now needs a pointer to the bytecode and the constant table, in order to execute the code.

I think this is a good starting point for a discussion of exactly what we want from persistent bytecode, and how to modify the above scheme to make it something useful for our needs. So fire away with comments/criticism/etc!


Reply to this email directly or view it on GitHub.

@pfalcon
Copy link
Contributor Author
pfalcon commented Oct 23, 2015

We'll need to release next version as 1.6 ;-).

Some quick comments:

6 byte header

That's huge, can we reduce this to 2-3 bits? ;-)

The main thing is that qstrs have an indirection into the constant table.

Do you actually mean that all constant objects are referenced indirectly via constant table? That's the only way I could quickly interpret it. When I thought how to implement it, first idea in list was to give up varlen encoding for qstrs, to be able to patch in references to them on "linking" (i.e. persistent bytecode has e.g. local qstr ids (or perhaps even offsets into table), on loading each qstr is added to global table, and its global id is patched back). Of course, having r/o bytecode is more beneficial (well, depends on environment of course), but it's unclear how to deal with global qstr table.

Also, to clarify, "code chunk" starts with "- bytecode length (encoded as var-uint)" and ends with "- literal data for all constants (eg bignum, float)", optionally followed by recursive code chunks of the same structure, with the top-level chunk being module-level code? It's also not clear how code chunks are linked together - we used to encode direct pointer in bytecode.

Well, I guess code will clarify some of these questions.

@dpgeorge
Copy link
Member
dpgeorge commented Oct 23, 2015 via email

@dhylands
Copy link
Contributor

Sweet.

I can see some add-on utilities that would be useful.

If you don't have it already, having some type of version information so that a newer version of the firmware can tell if its capable of executing some older bytecodes would be useful.

So something that parallels frozen modules but for precompiled code.

Having a mechanism to "chunk" up the bytes code so that they fit conveniently into 512-byte file system blocks might be something worth exploring. That way the persistent bytecode could be stored as a file, and even though the file isn't contiguous it could still be executed directly from flash.

Storing and manipulating the bytecode blocks outside of the filesystem probably needs some type of index. Due to the size of the flash pages, we'd probably need some host side support for managing persistent bytecode in say upper flash. The host tool could transfer the existing code from flash, and allow portions to be added/updated and then written back to flash on the board. If an sdcard was available, then it could be used for temporary storage.

@hoihu
Copy link
Contributor
hoihu commented Oct 24, 2015

Great news!

In other words - "VM runs bytecode from flash", is that right?

I'm looking forward to less RAM requirements :) Great news for low-power, low cost platforms.
Also - because the compile stage is omitted there should be a significant decrease in FLASH as well I presume?

So what about the different code emitters (native,viper)? They are bound to a specific VM I believe - would they still be supported? Can bytecode segments be tagged so the VM nows how to interpret it?

Thinking of it - Viper, as I understand, emits thumb instructions so the precomiled output would actually be a normal hex file and wouldn't require a VM on the micro ;)

Looking forward to some more details!

@dpgeorge
Copy link
Member
dpgeorge commented Nov 2, 2015

Version 2 of persistent bytecode posted at #1577.

If you don't have it already, having some type of version information so that a newer version of the firmware can tell if its capable of executing some older bytecodes would be useful.

Yes there is a header with version info.

Having a mechanism to "chunk" up the bytes code so that they fit conveniently into 512-byte file system blocks might be something worth exploring. That way the persistent bytecode could be stored as a file, and even though the file isn't contiguous it could still be executed directly from flash.

That's really difficult. You'd need to heavily modify the VM to take into account the fact that the next byte of the bytecode may be in another 512-byte-block (eg decoding a variable-length integer, the first byte might be in one block, the second byte in the next block). I don't think it's worth it.

Instead it's better/easier to just freeze the precompiled bytecode into the firmware.

In other words - "VM runs bytecode from flash", is that right?

Yes, that can now be done using frozen bytecode.

Also - because the compile stage is omitted there should be a significant decrease in FLASH as well I presume?

Yes, about 20k less for Thumb2 archs. But you don't get a REPL anymore :(

So what about the different code emitters (native,viper)?

Not yet supported, but would be possible to make persistent native code blocks.

@dpgeorge
Copy link
Member
dpgeorge commented May 6, 2016

Persistent frozen bytecode is now merged in master.

@dpgeorge dpgeorge closed this as completed May 6, 2016
tannewt pushed a commit to tannewt/circuitpython that referenced this issue Sep 5, 2017
…hon#222)

The frozen module `_boot.py` was not being loaded on restart
because `pyexec_frozen_module()` did not know about the new `.frozen`
pseudo-directory. Updated lower-level routine to look in the right place.
Also made ".frozen" and related values be `#define`s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Request for Comment
Projects
None yet
Development

No branches or pull requests

7 participants
0