8000 ujson should be rewritten to take a stream as input natively · Issue #2489 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

ujson should be rewritten to take a stream as input natively #2489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pfalcon opened this issue Oct 7, 2016 · 8 comments
Closed

ujson should be rewritten to take a stream as input natively #2489

pfalcon opened this issue Oct 7, 2016 · 8 comments
Assignees

Comments

@pfalcon
Copy link
Contributor
pfalcon commented Oct 7, 2016

I raised this issue when ujson was just implemented, but then it was "by the way", but now it actually causes problems trying to port upip to run with low-heap systems like esp8266. This function may load ~4K json files: https://github.com/micropython/micropython-lib/blob/master/upip/upip.py#L131 , and there's already deficit of contiguous space of that size (4K for zlib dict, at least 4K for TLS buffer). Until this issue is fixed, upip on esp8266 won't work reliably (no saying if it would work reliably after this).

@dpgeorge dpgeorge self-assigned this Oct 10, 2016
@dpgeorge
Copy link
Member

Yes, I can work on this.

But having ujson.load(file) will only solve half of the problem: the actual data that's loaded takes up roughly the same room in RAM as the size of the file, and the json data that comes from PyPI is pretty large because it includes a list of all versions of the package. Eg micropython-upip package has a 15k json data file. So, over time, as more versions get added for each package, thing will stop working on low-heap ports.

Is it possible to get PyPI to store only recent versions (eg delete older ones)?

@deshipu
Copy link
Contributor
deshipu commented Oct 10, 2016

You can delete old versions from PyPi.

Another option would be to use event-based parser (similar to SAX), that doesn't build a tree in memory, but instead calls a callback for every node, and leaves it up to the user what to do with the data.

@deshipu
Copy link
Contributor
deshipu commented Oct 10, 2016

Another option would be to let the user specify which parts of the data he is interested in (for example, using xpath-like language) up front, before running the parser, and only collect those parts.

@pfalcon
Copy link
Contributor Author
pfalcon commented Oct 10, 2016 via email

@dpgeorge
Copy link
Member

Implemented in e93c1ca.

@stinos
Copy link
Contributor
stinos commented Oct 13, 2016

thanks, load/dump was the one thing where we ususally had to gc.collect() first to avoid out of memory situations

@pfalcon
Copy link
Contributor Author
pfalcon commented Oct 13, 2016

Thanks, that was fast! ;-) (I actually didn't expect it to be so fast.)

@dpgeorge
Copy link
Member

It turned out to be quite a straight forwardchange (also fun!).

I think we should implement ujson.dump(file) as well. Pretty easy/minimal because the print code already allows to take an arbitrary stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0