8000 RFC: Add zipfile support by dhylands · Pull Request #1797 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

RFC: Add zipfile support #1797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

dhylands
Copy link
Contributor

What I have coded so far seems to be working, although it needs some more testing. The next step after this is to hook up zipfiles into the import mechanism so that you can add a zipfile to sys.path.

Part of this adds mpfile.c/.h which gives a nice C API for accessing files or "file-like" objects. For example, a python script contained within a zipfile would be read using a "file-like" object (in particular an instance of the ZipExtFile object). File-like objects could also be written in python (see the ByteFile class in tests/extmod/zipfile1 for an example).

Using mpfile, it would allow us to remove the os-specific lexers and instead create an mp-file based lexer.

To support importing from a zipfile, I think that I need to extend mp_raw_load_code_load_file and mp_lexer_new_from_file to support reading from a file or file-like object.

Using a compressed zipfile typically doubles the effective amount of storage that you have, and it also eliminates the average 256 bytes that are wasted per file on the regular filesystem.

On the unix build, this adds about 5.3K and on stmhal it adds about 2.4K

I've coded this so that it should work on a big-endian MCU, but I don't have anything to test it on.

One caveat to be aware of is that uzlib doesn't seem to support partial decompressing (or streaming), so the code needs to allocate enough memory for both the compressed and uncompressed data. The compressed data is freed as soon as the decompression is done.

@dhylands dhylands force-pushed the zipfile branch 15 times, most recently from 3bd1bab to c22ef4d Compare January 25, 2016 06:37
@peterhinch
Copy link
Contributor

That looks very useful given the limited flash space on the Pyboard. If this is accepted, corresponding mods to rshell to copy/edit/delete files in a zipped directory would be good.

@stinos
Copy link
Contributor
stinos commented Jan 26, 2016

Just out of interest, have you checked other compression algorithms? I seem to remember last time I tested something like this, plain zip was almost the worst option both speed- and compression-wise. I did not compare the code size though.

@peterhinch
Copy link
Contributor

@stinos zlib is a Python standard library so there are inter-operability benefits in implementing the same algorithm.

@dhylands dhylands force-pushed the zipfile branch 5 times, most recently from a06f15f to fee7f18 Compare January 31, 2016 07:54
@dhylands
Copy link
Contributor Author

Importing modules from a zipfile seems to be working now. I've also coded (but not yet tested) importing byte-code modules.

I added the ability to configure zipfile support and/or zipimport support. Size increases for stmhal:

2040 - zipimport alone
2536 - ZipFile alone
3072 - Both ZipFile and zipimport

I currently disabled the tests for windows. We can turn them on if windows decides to enable the options by default.

@dpgeorge
Copy link
Member
dpgeorge commented Feb 1, 2016

Thanks @dhylands. There is a lot of stuff here, some which brings undelying structural changes, like the mp_file stuff, which, if we used, would make sense to convert all lexers to use it. So it's going to take some time to consider the approach you've taken.

What is the main reason for this, is it because of lack of flash space? Can you please give the use-case that you hit that prompted you to do this.

PR #1811 (frozen bytecode) might eliminate a lot of the need for zipfile import. If you are anyway going to be compiling firmware yourself, then frozen bytecode is the most optimal thing to do. It requires no overhead for the filesystem, no ram for decompression or compilation, and the bytecode runs from flash (also extra qstrs from your scripts are in flash).

@dhylands
Copy link
Contributor Author
dhylands commented Feb 1, 2016

Yeah - my primary usecase is filling the flash storage, in my case on the Espruino Pico. There is no sdcard available on this device to expand the space.

Using precompiled bytecode takes about half the space of using source code.
Using zip compressed precompiled bytecode takes up about half the space of that.

With the filesystem, there is about half a block (256 bytes) of space wasted per file stored on the filesystem. Using a zipfile the wasted space per file is much less.

Using frozen bytecode would help, although the Espruino Pico flash is very close to full (about 11k available). Reducing the filesystem would give back some of that space (there is one 64K block that we're only using 16K of - so removing that 16K block from the FS gives back 64K of flash).

The disadvantage of using frozen bytecode is that you need to rewrite the entire image for each update.

So using zipfile import seemed like a reasonable tradeoff (2K to implement), and it doesn't require reflashing the firmware.

I did make zipimport and zipfile support completely configurable.

Once precompiled bytecode (.mpy files) lands, that buys me a bunch of space, and I can use that just as easily as using zipfiles.

@pfalcon
Copy link
Contributor
pfalcon commented Feb 1, 2016

Codebase updated to uzlib 1.2.2 with the fix, @dhylands , please rebase.

This supports decompressing stored files, and if
MICROPY_PY_ZLIB is enabled then DEFLATED files (the default
compression that zip uses) can be decompressed.
@dmazzella
Copy link
Contributor

news on this?

@dhylands
Copy link
Contributor Author

Enough has changed, that this probably needs to be totally redone. Due to personal reasons, I haven't had the time in a large enough block to do anything with this. I don't mind if this is closed and I can reopen in the future if I get a chance to rework it.

@klardotsh
Copy link
klardotsh commented Oct 1, 2018

I'm quite tempted to take a look into reviving this - I've got something like 106k of FROZEN_MPY going to my device at this point and the project still isn't complete - phew! And that's down from about 115k of raw Python scripts (with comments and la-dee-da). Compare that to: 16K for a .tar.gz of my source tree, and 24K for a .zip (created with tar cjvf blah.tar.gz mysrc and 7z a blah.zip mysrc, respectively).

I think there's still more than plenty of value here (it's certainly easier than my other alternative once the project gets big enough - having the "Python part" of the project run only on the PC, and ultimately flash a compiled C hex to the PyBoard/NRF target. I've already long since blown past the flashable-with-stock-mpconfigport size on one of my previously-target boards...).

@pfalcon
Copy link
Contributor
pfalcon commented Oct 1, 2018

@klardotsh: How is it going to help you? Do you have too much RAM to trade for flash? That's unlikely situation for a typical "deeply embedded" board. It would help low-end Linux boards, yeah, those which have 32MB RAM and 4MB flash.

If your frozen bytecode takes too much space, make sure you compile it with right optimization settings. And if you do, next step is to look into removing too much of the reflection information included, like method/kwarg names, which is only required for overdynamicity in Python, which is optional, extra feature in MicroPython.

@pfalcon
Copy link
Contributor
pfalcon commented Oct 1, 2018

A few comments on this PR:

py: Add C API for reading from file or file-like objects

This particular commit was authorized 2016-01-24, but since 2014-01-08, we already have C API for reading from file-like objects. It's called py/stream.c.

extmod: Add uzipfile

This is apparently useful, but the commit message should describe which subset of CPython's zipfile API it implements.

One caveat to be aware of is that uzlib doesn't seem to support partial decompressing (or streaming)

Ok, since about 2016-08-17 it supports it.

Note that there's an API change in upstream uzlib 2.9.xx (pre-3.0), I'm waiting for my other patches to be processed before working on upgrading moduzlib to it.

@klardotsh
Copy link
klardotsh commented Oct 1, 2018

@pfalcon I seem to get no difference in output filesize on most files no matter what optimization levels I call mpy-cross with. That said, I can easily take a 20k mpy file down to 8k by throwing it in a zip file.

Even being able to zip/gzip individual modules would be fantastic - my project isn't super heavy on RAM, so a few files (especially ones mostly made of consts) being deflated at runtime is doable.

FWIW my target devices are currently a PyBoard and an NRF52840 dev board, so I've got the flash on these (what, 1MB or so?). Other devices I'd enjoy being able to throw this project on aren't as lucky - for example the Adafruit Feather nRF52832 has something like 256k, and some STM32 devices I'd like to port to I believe are 128k (meaning I'm pretty sure I'd have to shrink my project's code somehow - even if I rip the compiler and REPL out of MicroPython, there's no way I'm fitting uPy into... what, 20K that I'd have left in my current form?)

It makes me wonder a little how folks actually run full projects on MicroPython/CircuitPython boards that don't have as huge as PyBoard's ROM - GPIO SD cards? Clever hackery I haven't discovered yet?

@peterhinch
Copy link
Contributor

@klardotsh There is official support for SD cards connected via an SPI interface.

nickzoic pushed a commit to nickzoic/micropython that referenced this pull request Apr 16, 2019
@dpgeorge
Copy link
Member

Closing due to inactivity, and because it requires a lot of rework.

@dpgeorge dpgeorge closed this May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0