-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
ujson should be rewritten to take a stream as input natively #2489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I can work on this. But having ujson.load(file) will only solve half of the problem: the actual data that's loaded takes up roughly the same room in RAM as the size of the file, and the json data that comes from PyPI is pretty large because it includes a list of all versions of the package. Eg micropython-upip package has a 15k json data file. So, over time, as more versions get added for each package, thing will stop working on low-heap ports. Is it possible to get PyPI to store only recent versions (eg delete older ones)? |
You can delete old versions from PyPi. Another option would be to use event-based parser (similar to SAX), that doesn't build a tree in memory, but instead calls a callback for every node, and leaves it up to the user what to do with the data. |
Another option would be to let the user specify which parts of the data he is interested in (for example, using xpath-like language) up front, before running the parser, and only collect those parts. |
Yes, I can work on this.
Ok, thanks, because otherwise I can work on this at some time, and my
plan would be: remove string parsing support; make stream parsing by
reading a single char, leaving optimization for later; reintroduce
string parsing by wrapping string in io.StringIO internally.
But having ujson.load(file) will only solve half of the problem: the
actual data that's loaded takes up roughly the same room in RAM as
Yes, that solves problem of the need to have contiguous block of the
size of JSON file - the most pressing problem, not the memory problem
overall.
Is it possible to get PyPI to store only recent versions (eg delete
older ones)?
Yep, either manually or, if API exists, by developing automation
toolset.
|
Implemented in e93c1ca. |
thanks, load/dump was the one thing where we ususally had to gc.collect() first to avoid out of memory situations |
Thanks, that was fast! ;-) (I actually didn't expect it to be so fast.) |
It turned out to be quite a straight forwardchange (also fun!). I think we should implement ujson.dump(file) as well. Pretty easy/minimal because the print code already allows to take an arbitrary stream. |
I raised this issue when ujson was just implemented, but then it was "by the way", but now it actually causes problems trying to port upip to run with low-heap systems like esp8266. This function may load ~4K json files: https://github.com/micropython/micropython-lib/blob/master/upip/upip.py#L131 , and there's already deficit of contiguous space of that size (4K for zlib dict, at least 4K for TLS buffer). Until this issue is fixed, upip on esp8266 won't work reliably (no saying if it would work reliably after this).
The text was updated successfully, but these errors were encountered: