-
-
Notifications
You must be signed in to change notification settings - Fork 11k
EHN: Using in-mem temporary files rather than in-disk for building zip archive in _savez #6540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please read |
1d07e5e
to
be4e8bc
Compare
All test are passing except the wheel one. Before squashing commits the "wheel" used to pass too an my code do not touch at all distribution. Any clue on what can happen ? |
Do a force push: |
I haven't read carefully, but I doubt you can turn on in-memory tempfiles by default, because AFAICT they greatly increase peak memory usage so you'll break existing code that tries to save arrays that are, say, 90% of the size of RAM. The real win would be to avoid making a copy of the data at all (e.g. by pointing zipfile directly at the array's internal memory buffer when possible). Can you make one pull request just for the |
@charris done, but that do not trigger the checks. @njsmith at least in python3 buffer is shared between the memfile and zipfile. I want first to solve that wheel problem before splitting the PR and defaulting to in-disk temp file |
b92bcfe
to
08359d9
Compare
I've done the split between #6545 and current pull request. |
08359d9
to
b76928a
Compare
DONE Unit test on mem file added to test_io.py |
2f0659e
to
22b6fe3
Compare
I've got a problem with test_load_refcount that randomly fails even on the master branch So sometimes Travis check fails or not with the same exact code. |
22b6fe3
to
7ab52f9
Compare
ab18b88
to
1b72b3e
Compare
1b72b3e
to
e9f8188
Compare
☔ The latest upstream changes (presumably #7133) made this pull request unmergeable. Please resolve the merge conflicts. |
e9f8188
to
49e07a3
Compare
The function _savez stages data during archiving through temporary files. Before that commit these temp and zip files were handled through old open/close style. This commits switches to 'with' statement. which is the recommendation since python 2.5. It's simplifying A LOT exception handling and garbage collection. Due to uncompatibility between zipfile in python2.6 and with statement, ZipFile class has been subclassed to add needed stub methods. BUG: WinNT prevents a file to be opened twice Using contextlib.closing rather than own class
Temporary files are used in the _savez function to stage data during archiving. The choice is done through new keyword arg: disk_temp_files If set to True, use in-disk temporary files (default_option). If set to False, use in-mem temporary files. Please note that the in-mem files are based on BytesIO. In python2 BytesIO lacks getbuffer method which returns the content of the file without copying it. Thus in python2, getvalue is used which is less memory efficient. In python3 getbuffer is privileged. WINNT bug
f40ff23
to
e17f22f
Compare
☔ The latest upstream changes (presumably #7518) made this pull request unmergeable. Please resolve the merge conflicts. |
In Python 3.6 you can write to ZIP files directly, without using temporary files. See #9863. |
After the release of NumPy 1.14.0, this may be achieved by upgrading to Python 3.6, which allows writing streams directly to the zipfile. That seems a better solution, so closing this. Thanks @rherault-insa. |
Temporary files are used in the _savez function to stage data during
archiving. The choice is done through new keyword arg: disk_temp_files
If set to True, use in-disk temporary files (default_option).
If set to False, use in-mem temporary files.
Please note that the in-mem files are based on BytesIO.
In python2 BytesIO lacks getbuffer method which returns the content of
the file without copying it. Thus in python2, getvalue is used which is
less memory efficient. In python3 getbuffer is privileged.
This pull request is based on an other pull request #6545.